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(57) Abstract: Mutations in genes having no direct relationship to the carotenoid biosynthetic pathway have been found to increase 
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TITLE 

MUTATIONS AFFECTING CAROTENOID PRODUCTION 
This application claims the benefit of U.S. Provisional Application No. 
60/435,612 filed December 19, 2002. 
FIELD OF THE INVENTION 
This invention is in the field of microbiology. More specifically, 
this invention pertains to gene mutations which affect carotenoid production 
levels in microorganisms. 

, BACKGROUND OF THE INVENTION 

Carotenoids are pigments that are ubiquitous throughout nature and 
synthesized by all oxygen evolving photosynthetic organisms, and in some 
heterotrophic growing bacteria and fungi. Industrial uses of carotenoids 
include pharmaceuticals, food supplements, electro-optic applications, animal 
feed additives, and colorants in cosmetics, to mention a few. 

Because animals are unable to synthesize carotenoids cfe novo, they 
must obtain them by dietary means. Thus, manipulation of carotenoid 
production and composition in plants or bacteria can provide new or improved 
sources for carotenoids. 

Carotenoids come in many different forms and chemical structures. 
Most naturally-occurring carotenoids are hydrophobic tetraterpenoids 
containing a C40 methyl-branched hydrocarbon backbone derived from 
successive condensation of eight C5 isoprene units (isopentenyl 
pyrophosphate, IPP). In addition, novel carotenoids with longer or shorter 
backbones occur in some species of nonphotosynthetic bacteria. The term 
"carotenoid" actually includes both carotenes and xanthophylls. A "carotene" 
refers to a hydrocarbon carotenoid. Carotene derivatives that contain one or 
more oxygen atoms, in the form of hydroxy-, methoxy-, oxo-, epoxy-, carboxy- 
, or aldehydic functional groups, or within glycosides, glycoside esters, or 
sulfates, are collectively known as "xanthophylls". Carotenoids are 
furthermore described as being acyclic, monocyclic, or bicyclic depending on 
whether the ends of the hydrocarbon backbones have been cyclized to yield 
aliphatic or cyclic ring structures (G. Armstrong, (1999) In Comprehensive 
Natural Products Chemistry, Elsevier Press, "volume 2, pp 321-352). 

The genetics of carotenoid pigment biosynthesis are well known 
(Armstrong et al., J. Barf., 176: 4795-4802 (1994); Annu. Rev. Microbiol. 
51 :629-659 (1997)). This pathway is extremely well studied in the Gram- 
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negative, pigmented bacteria of the genera Pantoea, formerly known as 
Erwinia. In both E herbicola EHO-10 (ATCC 39368) and E uredovora 20D3 
(ATCC 19321), the crt genes are clustered in two operons, crt Z and crt 
EXYIB (US 5,656,472; US 5,545,816; US 5,530,189; US 5,530,188; and 

5 US 5,429,939). Despite the similarity in operon structure, the DNA 

sequences of E uredovora and E herbicola crt genes show no homology by 
DNA-DNA hybridization (US 5,429,939,). 

The building block for carotenoids, IPP, is an isoprenoid. Isoprenoids 
constitute the largest class of natural products in nature, and serve as 

10 precursors for sterols (eukaryotic membrane stabilizers), gibberelinns and 
abscisic acid (plant hormones), menaquinone, plastoquinones, and 
ubiquinone (used as carriers for electron transport), as well as carotenoids 
and the phytol side chain of chlorophyll (pigments for photosynthesis). All 
isoprenoids are synthesized via a common metabolic precursor, isopentenyl 

15 pyrophosphate (IPP). Until recently, the biosynthesis of IPP was generally 
assumed to proceed exclusively from acetyl-CoA via the classical mevalonate 
pathway. However, the existence of an alternative mevalonate-independent 
pathway for IPP formation has been characterized for eubacteria and a green 
alga. E.coli contain genes that encode enzymes of the mevalonate- 

20 independent pathway of isoprenoid biosynthesis (Figure 1 ). In this pathway, # 
isoprenoid biosynthesis starts with the condensation of pyruvate with 
glyceraldehyde-3-phosphate (G3P) to form deoxy-D-xylulose via the enzyme 
encoded by the dxs gene. A host of additional enzymes are then used in 
subsequent sequential reactions, converting deoxy-D-xylulose to the final C5 

25 isoprene product, isopentenyl pyrophosphate (IPP). IPP is converted to the 
isomer dimethylallyl pyrophophate (DMAPP) via the enzyme encoded by the 
ft// gene. IPP is condensed with DMAPP to form C10 geranyl pyrophosphate 
(GPP) which is then elongated to C15 farnesyl pyrophosphate (FPP). 
FPP synthesis is common in both carotenogenic and non- 
30 carotenogenic bacteria. E.coli do not normally contain the genes necessary 
for conversion of FPP to p-carotene (Figure 1). Enzymes in the subsequent 
carotenoid pathway used to generate carotenoid pigments from FPP 
precursor can be divided into two categories: carotene backbone synthesis 
enzymes and subsequent modification enzymes. The backbone synthesis 

35 enzymes include geranyl geranyl pyrophosphate synthase (CrtE), phytoerie 
synthase (CrtB), phytoene dehydrogenase (Crtl), and lycopene cyclase 
(CrtY/L), etc. The modification enzymes include ketolases, hydroxylases, 
dehydratases, glycosylases, etc. 
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Engineering E. coli for increased carotenoid production has previously 
focused on overexpression of key isoprenoid pathway genes from multi-copy 
plasmids. Various studies have report between a 1.5X and SOX increase in 
carotenoid formation in such E. coli systems upon cloning and transformation 
5 of plasmids encoding isopentenyl diphosphate isomerase (idi), geranylgeranyl 
pyrophosphate (GGPP) synthase (gps), deoxy-D-xylulose-5-phosphate (DXP) 
synthase {dxs), and DXP reductoisomerase (dxr) from various sources (Kim, 
S.-W., and Keasling, J. D., Biotech. Bioeng., 72:408-415 (2001); Mathews, P. 
D., and Wurtzel, E. T., Appl. Microbiol. Biotechnol., 53:396-400 (2000); 

10 Harker, M, and Bramley, P. M., FEBS Letter., 448:115-119 (1999); Misawa, 
N., and Shimada, H. f J. BiotechnoL, 59:169-181 (1998); Lrao et al. f 
Biotechnol. Bioeng., 62:235-241 (1999); Misawa etal., Biochem. J. 9 
324:421-426 (1997); and Wang etal., Biotech. Bioeng., , 62:235-241 (1999)). 
Alternatively, other attempts to genetically engineer microbial hosts for 

15 increased production of carotenoids have focused on directed evolution of 
gps (Wang et al., Biotechnol. Prog., 16:922-926 (2000)) and overexpression 
of various isoprenoid and carotenoid biosynthetic genes in different microbial 
hosts using endogenous and exogenous promoters (Lagarde et aL, Appl. 
Env. Microbiol., 66:64-72 (2000); Szkopinska et al., J. Lipid Res., 38:962-968 

20 (1997); Shimada et al., Appl. Env. Microb., 64:2676-2680 (1 998); and 
Yamano et aL, Biosci. Biotech. Biochem., 58:1 1 12-1 1 14 (1 994)). 

Although these attempts at modulating carotenoid production have had 
some positive results, the production increases that can be effective by 
modulation of pathway enzymes is finite. For example, it has been noted that 

25 increasing isoprenoid precursor supply seems to be lethal (Sandmann, G., 
Trends in Plant Science, 6:14-17 (2001)), indicating limitations in the amount 
of carotenoid storage in E. coli. It is clear that alternate modifications will have 
to be made to achieve higher levels. 

The problem to be solved therefore is to create a carotenoid 

30 overproducing organism for the production of new and useful carotenoids that 
do not involve direct manipulation of carotenoid or isoprenoid biosynthesis 
pathway genes. Applicants have solved the stated problem through the 
discovery that mutations in genes not involved in the isoprenoid or carotenoid 
biosynthetic pathways have a marked effect in increasing carotenoid 

35 production in a carotenoid producing microorganism. 

SUMMARY OF THE INVENTION 
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The invention provides a carotenoid overproducing microorganism 
comprising the genes encoding a functional isoprenoid enzymatic biosynthetic 
pathway comprising a disrupted gene selected from the group consisting of 
deaD, mreC and yfhE. Carotenoid overproducing microorganisms of the 
5 invention will preferably contain: 

a) an upper isoprenoid enzymatic biosynthetic pathway comprising 
the genes dxs, dxr, ygbP (ispD), ychB (ispE), ygbB (ispF), lytB, 
idi, ispA, and ispB; and' 

b) a lower isoprenoid enzymatic biosynthetic pathway comprising 
10 the genes crtE, crtB, crtl, and crtY, and optionally crtZ and crtW 

In another embodiment the invention provides a carotenoid 
overproducing E. coli comprising: 

a) an upper isoprenoid enzymatic biosynthetic pathway comprising 
the genes dxs, dxr, ygbP (ispD), ychB (ispE), ygbB (ispF), lytB, 

15 idi, ispA, and ispB; 

b) a lower isoprenoid enzymatic biosynthetic pathway comprising 
the genes crtE, crtB, crtl, and crtY; 

c) mutations selected from the group consisting of: a mutation in 
the thrS gene as set forth in SEQ ID NO: 35, a mutation in the 

20 rpsA gene as set forth in SEQ ID NO: 37, a mutation in the 

rpoC gene as set forth in SEQ ID NO: 38, a mutation in the yjeR 
gene as set forth in SEQ ID NO: 39, and a mutation in the rhoL 
gene as set forth in SEQ ID NO: 41 ; 
wherein the genes of the lower isoprenoid enzymatic biosynthetic 
25 pathway reside on an autonomously replicating plasmid comprising a replicon 
selected from the group consisting of p1 5A and pMB1 . 

Additionally the invention provides a method for the production of a 
carotenoid comprising: 

a) contacting the carotenoid overproducing microorganism of the 
30 invention with a fermentable carbon substrate; 

b) growing the carotenoid overproducing microorganism of step (a) 
for a time sufficient to produce a carotenoid; and 

c) optionally recovering the carotenoid form the carotenoid 
overproducing microorganism of step (b). 

35 BRIEF DESCRIPTION OF THE DRAWINGS 

AND SEQUENCE DESCRIPTIONS 
Figure 1 shows the biosynthetic pathway for production of p-carotene 
from E. coli used in the present application. 
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Figure 2 shows the strategy for mutagenesis and screening of £. coli 
chromosomal mutants that increase carotenoid production. 

Figure 3 shows the p-carotene production in E. coli mutants created in 
the present invention. 
5 Figure 4 shows the genetic organization of the regions of the E. coli 

chromosome where transposon insertions were located in the various E. coli 
mutants of the present invention. 

Figure 5 shows the pPCB15 plasmid encoding carotenoid biosynthetic 
genes used in the present application. 
10 The invention can be more fully understood from the following detailed 

description and the accompanying sequence descriptions, which form a part 
of this application. 

The following sequences comply with 37 C.F.R. 1.821-1.825 
("Requirements for Patent Applications Containing Nucleotide Sequences 
15 and/or Amino Acid Sequence Disclosures - the Sequence Rules") and are 
consistent with World Intellectual Property Organization (WIPO) Standard 
ST.25 (1 998) and the sequence listing requirements of the EPO and PCT 
(Rules 5.2 and 49.5(a-bis), and Section 208 and Annex C of the 
Administrative Instructions). The symbols and format used for nucleotide and 
20 amino acid sequence data comply with the rules set forth in 37 C.F.R. §1 .822. 

Table 1. 

Nucleotide and Amino Acid Sequences for Carotenoid Biosynthesis Genes 



Gene/Protein 


Source 


Nucleotide 


Amino Acid 


Product 




SEQIDNO 


SEQ ID NO 


CrtE 


Pantoea stewartii 


1 


2 


CrfX 


Pantoea stewartii 


3 


4 


CrtY 


Pantoea stewartii 


5 


6 


Crtl 


Pantoea stewartii 


7 


8 


CrtB 


Pantoea stewartii 


9 


10 


CrtZ 


Pantoea stewartii 


11 


12 



25 

SEQ ID NOs:13-14 are oligonucleotide primers used to amplify the 
carotenoid biosynthesis genes from P. stewartii 

SEQ ID NOs:15-16 are oligonucleotide primers used to identify the 
location of transposon insertions. 
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SEQ ID NOs:17-18 are oligonucleotide primers used to sequence the 
products amplified by SEQ ID NOs:15-16. 

SEQ ID NOs:19-34 are oligonucleotide primers used to confirm 
transposon insertion sites. 

SEQ ID NO: 35 is the nucleotide sequence of the mutated thrS gene 
with the Tn5 insertion. 

SEQ ID NO: 36 is the nucleotide sequence of the mutated deaD gene 
with the Tn5 insertion. 

SEQ ID NO: 37 Is the nucleotide sequence of the mutated rpsA gene 
with the Tn5 insertion. 

SEQ ID NO: 38 is the nucleotide sequence of the mutated rpoC gene 
with the Tn5 insertion. 

SEQ ID NO: 39 is the nucleotide sequence of the mutated yjeR gene 
with the Tn5 insertion, 

SEQ ID NO: 40 is the nucleotide sequence of the mutated mreC gene 
with the Tn5 insertion. 

SEQ ID NO: 41 is the nucleotide sequence of the mutated rhoL gene 
with the Tn5 insertion. 

SEQ ID NO: 42 is the nucleotide sequence of the mutated hscB (yfhE) 
gene with the Tn5 insertion. 

SEQ ID NO: 43 is the nucleotide sequence for the reporter plasmid 
PPCB15. 

DETAILED DESCRIPTION OF THE INVENTION 
The invention relates to the discovery that mutations in certain genes, 
not part of the isoprenoid or carotenoid biosynthetic pathway have the effect 
of increasing carotenoid production. Carotenoid over-producing 
microorganisms are those that either naturally possess a complete pathway 
or those that have the pathway engineered by recombinant technology. 

In this disclosure, a number of terms and abbreviations are used. The 
following definitions are provided. 

"Open reading frame" is abbreviated ORF. 

"Polymerase chain reaction" is abbreviated PCR. 

The term "p1 5A" refers to a replicon for a family of plasmid vectors 
including pACYC based vectors. 

The term "pMB1 " refers to a replicon for a family of plasmid vectors 
including pUC and pBR based vectors 
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The term "replicon" refers to a genetic element that behaves as an 
autonomous unit during replication. It contains sequences controlling 
replication of a plasmid including its origin of replication. 

The term "isoprenoid" or "terpenoid" refers to the compounds and any 

5 molecules derived from the isoprenoid pathway including 10 carbon 
terpenoids and their derivatives, such as carotenoids and xanthophylls. 

The "Isoprenoid Pathway" as used herein refers to the enzymatic 
pathway that is responsible for the production of isoprenoids. At a minimum 
the isoprenoid pathway contains the genes dxs, dxr, ygbP, ychB, ygbB, lytB, 

10 idi, ispA, and ispB which may also be referred to herein as the "Upper 
Isoprenoid Pathway" or "Upper Pathway". The "Carotenoid Biosynthetic 
Pathway" or "Lower Isoprenoid Pathway" or "Lower Pathway" refers to the 
genes encoding enzymes necessary for the production of carotenoid 
compounds and include, but are not limited to crtE, crtB, crtl, crtY, crtX, and 

15 crtZ. 

The term "carotenoid biosynthetic enzyme" is an inclusive term 
referring to any and all of the enzymes encoded by the Pantoea crtEXYIB 
cluster. The enzymes include CrtE, CrtY, Crtl, CrtB, and CrtX. 

A "disrupted gene" refers to a gene having a deletion or addition in the 
20 coding region of the gene such that there is a complete loss of the phenotype 
associated with that gene. 

The term "dxs" refers to the enzyme D-1-deoxyxylulose 5-phosphate 
encoded by the E. coli dxs gene which catalyzes the condensation of 
pyruvate and D-glyceraldehyde 3-phosphate to D-1-deoxyxylulose 5- 
25 phosphate. 

The term "idi" refers to the enzyme isopentenyl diphosphate isomerase 
encoded by the E. coli idi gene that converts isopentenyl diphosphate to 
dimethylallyl diphosphate. 

The term "pPCB1 5" refers to the plasmid containing p-carotene 
30 biosynthesis genes Pantoea crtEXYIB. The plasmid was used as a reporter 
plasmid for monitoring p-carotene production in E coli genetically engineered 
via the invented method (SEQ ID NO: 43). 

The term "E. coir refers to Escherichia coli strain K-12 derivatives, 
such as MG1655 (ATCC 47076). 
35 The term "Pantoea stewartii" will be used interchangeably with 

Erwinia stewartii (Mergaert et al., IntJ. Syst Bacterid, 43:162-173 (1993)). 

The term "Pantoea ananatas" is used interchangeably with Ewvinia 
uredovora (Mergaert et al., Int J. Syst Bacterid., 43:162-173 (1993)). 
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The term "Pantoea crtEXYIB cluster" refers to a gene cluster 
containing carotenoid synthesis genes crtEXYIB amplified from Pantoea 
stewartii ATCC 8199. The gene cluster contains the genes crtE, crtX % crtY, 
crtl, and crtB. The cluster also contains a crfZ gene organized in opposite 
5 direction adjacent to the crtB gene. 

The term "CrtE" refers to geranylgeranyl pyrophosphate synthase 
enzyme encoded by crtE gene which converts trans-trans-farnesyl 
diphosphate + isopentenyl diphosphate to pyrophosphate + geranylgeranyl 
diphosphate. 

10 The term "CrtY" refers to lycopehe cyclase enzyme encoded by crtY 

gene which converts lycopene to p-carotene. 

The term "Crtl" refers to phytoene dehydrogenase enzyme encoded by 
crtl gene which converts phytoene into lycopene via the intermediaries of 
phytofluene, zeta-carotene, and neurosporene by the introduction of 4 double 

15 bonds. 

The term "CrtB" refers to phytoene synthase enzyme encoded by crtB 
gene which catalyzes reaction from prephytoene diphosphate (geranylgeranyl 
pyrophosphate) to phytoene. 

The term "CrtX" refers to zeaxanthin glucosyl transferase enzyme 
20 encoded by crtX gene which converts zeaxanthin to zeaxanthin-p-diglucoside. 

The term "CrtZ" refers to the p-carotene hydroxylase enzyme encoded 
by crtZ gene which catalyses hydroxylation reaction from p-carotene to 
zeaxanthin. 

The term "thrS" refers to the threonyl-tRNA synthetase gene locus. 
25 The term "cteaD" refers to the RNA helicase gene locus. 

The term "rpsA" refers to the 30S t ribosomal subunit protein S1 gene 

locus. 

The term "rpoC 1 refers to the RNA polymerase p 1 subunit gene locus. 
The term yjeR 1 refers to the oligo-ribonuclease gene locus. 
30 The term u mreC l refers to the rod-shape determining protein gene 

locus. 

The term "rhoL" refers to the rho operon leader peptide gene locus. 
The terms "hscB" or tt yfhF refer to the heat shock cognate protein 
gene locus. 

35 As used herein, an "isolated nucleic acid fragment" is a polymer of 

RNA or DNA that is single- or double-stranded, optionally containing 
synthetic, non-natural or altered nucleotide bases. An isolated nucleic acid 
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fragment in the form of a polymer of DNA may be comprised of one or more 
segments of cDNA, genomic DNA or synthetic DNA. 

The term "complementary" is used to describe the relationship between 
nucleotide bases that are capable to hybridizing to one another. For example, 
5 with respect to DNA, adenosine is complementary to thymine and cytosine is 
complementary to guanine. 

"Codon degeneracy" refers to the nature in the genetic code permitting 
variation of the nucleotide sequence without effecting the amino acid 
sequence of an encoded polypeptide. The skilled artisan is well aware of the 

10 "codon-bias" exhibited by a specific host cell in usage of nucleotide codons to 
specify a given amino acid. Therefore, when synthesizing a gene for 
improved expression in a host cell, it is desirable to design the gene such that 
its frequency of codon usage approaches the frequency of preferred codon 
usage of the host cell. 

15 "Synthetic genes" can be assembled from oligonucleotide building 

blocks that are chemically synthesized using procedures known to those 
skilled in the art. These building blocks are ligated and annealed to form 
gene segments which are then enzymatically assembled to construct the 
entire gene. "Chemically synthesized", as related to a sequence of DNA, 

20 means that the component nucleotides were assembled in vitro. Manual 
chemical synthesis of DNA may be accomplished using well-established 
procedures, or automated chemical synthesis can be performed using one of 
a number of commercially available machines. Accordingly, the genes can be 
tailored for optimal gene expression based on optimization of nucleotide 

25 sequence to reflect the codon bias of the host cell. The skilled artisan 
appreciates the likelihood of successful gene expression if codon usage is 
biased towards those codons favored by the host. Determination of preferred 
codons can be based on a survey of genes derived from the host cell where 
sequence information is available. 

30 "Gene" refers to a nucleic acid fragment that expresses a specific 

protein, including regulatory sequences preceding (5 1 non-coding sequences) 
and following (3' non-coding sequences) the coding sequence. "Native gene" 
refers to a gene as found in nature with its own regulatory sequences. 
"Chimeric gene" refers to any gene that is not a native gene, comprising 

35 regulatory and coding sequences that are not found together in nature. 

Accordingly, a chimeric gene may comprise regulatory sequences and coding 
sequences that are derived from different sources, or regulatory sequences 
and coding sequences derived from the same source, but arranged in a 
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manner different than that found in nature. "Endogenous gene" refers to a 
native gene in its natural location in the genome of an organism. A "foreign" 
gene refers to a gene not normally found in the host organism, but that is 
introduced into the host organism by gene transfer. Foreign genes can 

5 comprise native genes inserted into a non-native organism, or chimeric 
genes. A "transgene" is a gene that has been introduced into the genome by 
a transformation procedure. 

"Operon", in bacterial DNA, is a cluster of contiguous genes 
transcribed from one promoter that gives rise to a polycistronic mRNA. 

10 "Coding sequence" refers to a DNA sequence that codes for a specific 

amino acid sequence. "Suitable regulatory sequences" refer to nucleotide 
sequences located upstream (5' non-coding sequences), within, or 
downstream (3' non-coding sequences) of a coding sequence, and which 
influence the transcription, RNA processing or stability, or translation of the 

15 associated coding sequence. Regulatory sequences may include promoters, 
translation leader sequences, introns, polyadenylation recognition sequences, 
RNA processing site, effector binding site and stem-loop structure. 

"Promoter" refers to a DNA sequence capaible of controlling the 
expression of a coding sequence or functional RNA. In general, a coding 

20 sequence is located 3* to a promoter sequence. Promoters may be derived in 
their entirety from a native gene, or be composed of different elements 
derived from different promoters found in nature, or even comprise synthetic 
DNA segments. It is understood by those skilled in the art that different 
promoters may direct the expression of a gene in different tissues or cell 

25 types, or at different stages of development, or in response to different 

environmental or physiological conditions. Promoters which cause a gene to 
be expressed in most cell types at most times are commonly referred to as 
"constitutive promoters". It is further recognized that since in most cases the 
exact boundaries of regulatory sequences have not been completely defined, 

30 DNA fragments of different lengths may have identical promoter activity. 

The "3' non-coding sequences" refer to DNA sequences located 
downstream of a coding sequence encoding regulatory signals capable of 
affecting mRNA processing or gene expression. 

"RNA transcript" refers to the product resulting from RNA polymerase- 

35 catalyzed transcription of a DNA sequence. When the RNA transcript is a 
perfect complementary copy of the DNA sequence, it is referred to as the 
primary transcript or it may be a RNA sequence derived from post- 
transcriptional processing of the primary transcript and is referred to as the 

10 
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mature RNA. "Messenger RNA (mRNA)" refers to the RNA that is without 
introns and that can be translated into protein by the cell. "cDNA" refers to a 
double-stranded DNA that is complementary to and derived from mRNA. 
"Sense" RNA refers to RNA transcript that includes the mRNA and so can be 

5 translated into protein by the cell. "Antisense RNA" refers to a RNA transcript 
that is complementary to all or part of a target primary transcript or mRNA and 
that blocks the expression of a target gene (U.S. Patent No. 5,107,065; 
WO 9928508). The complementarity of an antisense RNA may be with any 
part of the specific gene transcript, i.e., at the 5' non-coding sequence, 3 1 non- 

10 coding sequence, or the coding sequence. "Functional RNA" refers to 

antisense RNA, ribozyme RNA, or other RNA that is not translated yet has an 
effect on cellular processes. 

The term "operably linked" refers to the association of nucleic acid 
sequences on a single nucleic acid fragment so that the function of one is 

15 affected by the other. For example, a promoter is operably linked with a 
coding sequence when it is capable of affecting the expression of that coding 
sequence (i.e., that the coding sequence is under the transcriptional control of 
the promoter). Coding sequences can be operably linked to regulatory 
sequences in sense or antisense orientation. 

20 The term "expression", as used herein, refers to the transcription and 

stable accumulation of sense (mRNA) or antisense RNA derived from the 
nucleic acid fragment of the invention. Expression may also refer to 
translation of mRNA into a polypeptide. 

Transformation" refers to the transfer of a nucleic acid fragment into 

25 the genome of a host organism, resulting in genetically stable inheritance. 
Host organisms containing the transformed nucleic acid fragments are 
referred to as "transgenic" or "recombinant" or "transformed" organisms. 

The terms "plasmid", "vector" and "cassette" refer to an extra 
chromosomal element often carrying genes which are not part of the central 

30 metabolism of the cell, and usually in the form of circular double-stranded 
DNA fragments. Such elements may be autonomously replicating 
sequences, genome integrating sequences, phage or nucleotide sequences, 
linear or circular, of a single- or double-stranded DNA or RNA, derived from 
any source, in which a number of nucleotide sequences have been joined or 

35 recombined into a unique construction which is capable of introducing a 

promoter fragment and DNA sequence for a selected gene product along with 
appropriate 3' untranslated sequence into a cell. Transformation cassette" 
refers to a specific vector containing a foreign gene and having elements in 
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addition to the foreign gene that facilitate transformation of a particular host 
cell. "Expression cassette" refers to a specific vector containing a foreign 
gene and having elements in addition to the foreign gene that allow for 
enhanced expression of that gene in a foreign host. 
5 The term 'fermentable carbon substrate" refers to the carbon source 

metabolized by a carotenoid overproducing microorganism. Typically 
fermentable carbon substrates will include, but are not limited to, carbon 
sources selected from the group consisting of monosaccharides, 
oligosaccharides, polysaccharides, and one-carbon substrates or mixtures 
10 thereof. 

The term "carotenoid overproducing microorganism" refers to a 
microorganism of the invention which has been genetically modified by the 
up-regulation or down-regulation of various genes to produce a carotenoid 
compound a levels greater than the wildtype or unmodified host. 

15 The term "sequence analysis software" refers to any computer 

algorithm or software program that is useful for the analysis of nucleotide or 
amino acid sequences. "Sequence analysis software" may be commercially 
available or independently developed. Typical sequence analysis software 
will include, but is not limited to, the GCG suite of programs (Wisconsin 

20 Package Version 9.0, Genetics Computer Group (GCG), Madison, Wl), 

BLASTP, BLASTN, BLASTX (Altschul etal., J. Mol. Biol. 215:403-410(1990), 
and DNASTAR (DNASTAR, Inc. 1228 S. Park St. Madison, Wl 53715 USA), 
and the FASTA program incorporating the Smith-Waterman algorithm (W. R. 
Pearson, Comput. Methods Genome Res., [Proc. Int. Symp.] (1994), Meeting 

25 Date 1992, 111-20. Editor(s): Suhai, Sandor. Publisher: Plenum, New York, 
NY. Within the context of this application it will be understood that where 
sequence analysis software is used for analysis, that the results of the 
analysis will be based on the "default values" of the program referenced, 
unless otherwise specified. As used herein "default values" will mean any set 

30 of values or parameters which originally load with the software when first 
initialized. 

Standard recombinant DNA and molecular cloning techniques used 
here are well known in the art and are described by Sambrpok, J., Fritsch, 
E. F. and Maniatis, T., Molecular Cloning: A Laboratory Manual . Second 
35 Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989) 
(hereinafter "Maniatis"); and by Silhavy, T. J., Bennan, M. L. and Enquist, L. 
W., Experiments with Gene Fusions . Cold Spring Harbor Laboratory Cold 
Press Spring Harbor, NY (1984); and by Ausubel, F. M. et al., Current 
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Protocols in Molecular Biology , published by Greene Publishing Assoc. and 
Wiley-lnterscience (1987). 

The present invention relates to microorganisms that produce 
carotenoid compounds and methods for increasing carotenoid production in 
5 microorganisms having a functional isoprenoid biosynthetic pathway. 
Specifically, it has been found that mutations in genes having no direct 
relationship to the carotenoid biosynthetic pathway have been found to 
increase carbon flux through that pathway. For example, complete disruption 
of the deaD, mreC or yfhE genes was effective at increasing the production of 

10 carotenoid from an engineered host. Additionally, where genes of the lower 
carotenoid pathway reside on a plasmid having either a p15A or pMB1 
replicon, mutations in the thrS 9 rpsA, rpoC, yjeR, and rhoL genes were found 
to be similarly effective. 
Genes Involved in Carotenoid Production. 

15 The enzyme pathway involved in the biosynthesis of carotenoids can 

be conveniently viewed in two parts, the upper isoprenoid pathway providing 
for the conversion of pyruvate and glyceraIdehyde-3-phosphate to famesyl 
pyrophosphate and the lower carotenoid biosynthetic pathway, which provides 
for the synthesis of phytoene and all subsequently produced carotenoids. 

20 The upper pathway is ubiquitous in many microorganisms. In the present 
invention it will only be necessary to introduce genes that comprise the lower 
pathway for the biosynthesis of the desired carotenoid. The key division 
between the two pathways concerns the synthesis of famesyl pyrophosphate 
(FPP). Where FPP is naturally present, only elements of the lower carotenoid 

25 pathway will be needed. However, it will be appreciated that for the lower 
pathway carotenoid genes to be effective in the production of carotenoids, it 
will be necessary for the host cell to have suitable levels of FPP within the 
cell. Where FPP synthesis is not provided by the host cell, it will be 
necessary to introduce the genes necessary for the production of FPP. Each 

30 of these pathways will be discussed below in detail. 
The Upper Isoprenoid Pathway 

Isopentenyl pyrophosphate (IPP) biosynthesis occurs through either of 
two pathways. First, IPP may be synthesized through the well-known 
acetate/mevalonate pathway. However, recent studies have demonstrated 
35 that the mevaloriate-depentient pathway does not operate in all living 
organisms. An alternate mevalonate-independent pathway for IPP 
biosynthesis has been characterized in bacteria, green algae, and higher 
plants (Horbach et al. f FEMS Microbiol. Lett., 1 1 1 :1 35-1 40 (1993); Rohmer 
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et al f Biochem., 295: 517-524 (1993); Schwender et al., Biochem., 316: 73-80 
(1996); and Eisenreich et al., Proc. Natl. Acad. Sci. USA, 93: 6431-6436 
(1996)). 

Many steps in both isoprenoid pathways are known (Figure 1). For 

5 example, the initial steps of the alternate pathway leading to the production of 
IPP have been studied in Mycobacterium tuberculosis by Cole et al. (Nature, 
393:537-544 (1998)). The first step of the pathway involves the condensation 
of two 3-carbon molecules (pyruvate and D-glyceraldehyde 3-phosphate) to 
yield a 5-carbon compound known as D-1-deoxyxylulose-5-phosphate. This 

10 reaction occurs by the DXS enzyme, encoded by the dxs gene. Next, the 
isomerization and reduction of D-1-deoxyxylulose-5-phosphate yields 2-C- 
methyl-D-eiythritol-4-phosphate. One of the enzymes involved in the 
isomerization and reduction process is D-1-deoxyxylulose-5-phosphate 
reductoisomerase (DXR), encoded by the gene dxr. 2-C-methyl-D-erythritol- 

15 4-phosphate is subsequently converted into 4-diphosphocytidyl-2C-methyl-D- 
erythritol in a CTP-dependent reaction by the enzyme encoded by the 
non-annotated gene ygbP. Recently, however, the ygbP gene was renamed 
as ispD as a part of the isp gene cluster (SwissProtein Accession #Q46893). 
Next, the 2 nd position hydroxy group of 4-diphosphocytidyl-2C-methyl- 

20 D-erythritol can be phosphorylated in an ATP-dependent reaction by the 
enzyme encoded by the ychB gene. This product phosphorylates 
4-diphosphocytidyl-2C-methyl-D-erythritol, resulting in 4-diphosphocytidyl- 
2C-methyl-D-erythritol 2-phosphate. The ychB gene was renamed as ispE, 
also as a part of the isp gene cluster (SwissProtein Accession #P24209). 

25 Finally, the enzyme encoded by the ygbB gene converts 4-diphosphocytidyl- 
2C-methyl-D-erythritol 2-phosphate to 2C-methyl-D-erythritol 2,4- 
cyclodiphbsphate in a CTP-dependent manner. This gene has also been . 
recently renamed, and belongs to the isp gene cluster. Specifically, the new 
name for the ygbB gene is ispF (SwissProtein Accession #P36663). 

30 It is known that 20methyl-D-erythritol 2,4-cyclodiphosphate can be 

further converted into IPP to ultimately produce carotenoids in the carotenoid 
biosynthesis pathway. However, the reactions leading to the production of 
isopentenyl monophosphate from 2C-methyl-D-erythritol 2,4- 
cyclodiphosphate are not yet well-characterized. The enzymes encoded by 

35 the lytB and gcpE genes (and perhaps others) are thought to participate in the 
reactions leading to formation of isopentenyl pyrophosphate (IPP) and 
dimethylallyl pyrophosphate (DMAPP). 
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IPP may be isomerized to DMAPP via IPP isomerase, encoded by the 
idi gene, however this enzyme is not essential for survival and may be absent 
in some bacteria using 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway. 
Recent evidence suggests that the MEP pathway branches before IPP and 
5 separately produces IPP and DMAPP via the lytB gene product. A lytB 

knockout mutation is lethal in E coli except in media supplemented with both 
IPP and DMAPP. 

The synthesis of FPP occurs via isomerization of IPP to dimethylallyl 
pyrophosphate (DMAPP). This reaction is followed by a sequence of two 
10 prenyltransferase reactions catalyzed by ispA, leading to the creation of 
geranyl pyrophosphate (GPP; a 10-carbon molecule) and farnesyl 
pyrophosphate (FPP; 15-carbon molecule). 

Genes encoding elements of the upper pathway are known from a 
variety of plant, animal, and bacterial sources, as shown in Table 2. 
15 Table 2 

Sources of Genes Encoding the Upper Isoprene Pathway 



Gene 


GenBank® Accession Number and 
Source Organism 


cbrs(D-1- 
deoxyxylulose 5- 
phosphate 
synthase) 


AF035440, Escherichia coli 
Y1 8874, Synechococcus PCC6301 
AB026631 , Streptomyces sp. CL1 90 
AB042821 , Streptomyces griseolosporeus 
AF1 1 1814, Plasmodium falciparum 
AF143812, Lycopersicon esculentum 
AJ279019, Narcissus pseudonarcissus 
AJ291721, Nicotiana tabacum 


cfxr(1-deoxy-D- 
xylulose 5- 
phosphate 
reductoisomeras 
e) 


AB01 3300, Escherichia coli 
AB049187, Streptomyces griseolosporeus 
AF1 11813, Plasmodium falciparum 
AF1 1 6825, Mentha x piperita 
AF1 48852, Arabidopsis thaliana 
AF1 82287, Artemisia annua 
AF250235, Catharanthus roseus 
AF282879, Pseudomonas aeruginosa 
AJ242588, Arabidopsis thaliana 
AJ250714, Zymomonas mobilis strain ZM4 
AJ29231 2, Klebsiella pneumoniae 
AJ297566, Zea mays . . 
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ispD (2-C- 

methyl-D- 

erythritol 4- 

phosphate 

cytidylyltransfera 

se) 


AB037876, Arabidopsis thaliana 
AF1 09075, Clostridium difficile 
AF230736, Escherichia coli 
AF230737, Arabidopsis thaliana 


ispE (4- 
diphosphocytidyl 
-2-C-methyl-D- 
erythritol kinase) 


AF21 6300, Escherichia coli 
AF263101, Lycopersicon esculentum 
AF288615, Arabidopsis thaliana 


ispF (2-C- 
methyl-D- 
erythritol 2,4- 
cyclodiphosphat 
e synthase) 


AB038256, Escherichia coli mecs gene 
AF230738, Escherichia coli 
AF250236, Catharanthus roseus (MECS) 
AF279661 , Plasmodium falciparum 
AF321 531 , Arabidopsis thaliana 


lytB 


AF0271 89, Acinetobacter sp. BD41 3 
AF098521 , Burkholderia pseudomallei 
AF291696, Streptococcus pneumoniae 
AF323927, Plasmodium falciparum 
M87645, Bacillus subtillis 
U3891 5, Synechocystis sp. 
X89371 , Campylobacter jejuni 


ffcpE(1- 

hydroxy-2- 
methyl-2-(E)- 
butenyl 4- 
diphosphate 
synthase) 


067496, Aquifex aeolicus 
P54482, Bacillus subtilis 
Q9pky3, Chlamydia muridarum 
Q9Z8H0, Chlamydophila pneumoniae 
O84060, Chlamydia trachomatis 
P27433, Escherichia coli 
P44667, Haemophilus influenzae 
Q9ZLL0, Helicobacter pylori J99 
033350, Mycobacterium tuberculosis 
S77159, Synechocystis sp. 
Q9WZZ3, Thermotoga marrtima 
083460, Treponema pallidum 
Q9JZ40, Neisseria meningitidis 
Q9PPM 1 , Campylobacter jejuni 
Q9RXC9, Deinococcus radiodurans 
AAG07190, Pseudomonas aeruginosa 
Q9KTX1 , Vibrio cholerae 
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ispA (FPP 
synthase) 



AB003187, Micrococcus luteus 
AB016094, Synechococcus elongatus 



AB021747, Oryza sativa FPPS1 gene for farnesyl 
diphosphate synthase 
AB028044, Rhodobacter sphaeroides 
AB028046, Rhodobacter capsulatus 
AB028047, Rhodovulum sulfidophilum 
AF1 12881 and AF1 36602, Artemisia annua 
AF384040, Mentha x piperita 
D00694, Escherichia coli 
D13293, B. stearothermophilus 
D85317, Oryza sativa 
X75789, Arabidopsis thaliana 
Y12072, G.arboreum 
Z49786, H.brasiliensis 

U80605, Arabidopsis thaliana farnesyl diphosphate 
synthase precursor (FPS1) mRNA, complete cds 

X76026, KJactis FPS gene for farnesyl diphosphate 
synthetase, QCR8 gene for bc1 complex, subunit VIII 

X82542, P.argentatum mRNA for farnesyl diphosphate 
synthase (FPS1) 

X82543, P.argentatum mRNA for farnesyl diphosphate 
synthase (FPS2) 

BC010004, Homo sapiens, farnesyl diphosphate synthase 
(farnesyl pyrophosphate synthetase, 
dimethylallyltranstransferase, geranyltranstransferase), 
clone MGC 15352 IMAGE, 4132071, mRNA, complete cds 
AF234168, Dictyostelium discoideum farnesyl diphosphate 
synthase (Dfps) 

L46349, Arabidopsis thaliana farnesyl diphosphate 
synthase {FPS2) mRNA, complete cds 
L46350, Arabidopsis thaliana farnesyl diphosphate 
synthase (FPS2) gene, complete cds 
L46367, Arabidopsis thaliana farnesyl diphosphate 
synthase (FPS1) gene, alternative products, complete cds 
M89945, Rat farnesyl diphosphate synthase gene, exons 
1-8 

NM_002004, Homo sapiens farnesyl diphosphate synthase 
(farnesyl pyrophosphate synthetase, 
dimethylallyltranstransferase, geranyltranstransferase) 
(FDPS), mRNA 

U36376, Artemisia annua farnesyl diphosphate synthase 
(fps1) mRNA, complete cds 

XM_001352, Homo sapiens farnesyl diphosphate synthase 
(farnesyl pyrophosphate synthetase, 
dimethylallyltranstransferase, geranyltranstransferase) 
(FDPS), mRNA 
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XM_034497, 

Homo sapiens farnesyl diphosphate synthase (farnesyl 
pyrophosphate synthetase, dimethylallyltranstransferase, 
geranyltranstransferase) (FDPS), mRNA 
XMJ)34498, 

Homo sapiens farnesyl diphosphate synthase (farnesyl 
pyrophosphate synthetase, dimethylallyltranstransferase, 
geranyltranstransferase) (FDPS), mRNA 
XMJ)34499, 

Homo sapiens farnesyl diphosphate synthase (farnesyl 
pyrophosphate synthetase, dimethylallyltranstransferase, 
geranyltranstransferase) {FDPS), mRNA 
XM_034500, 

Homo sapiens farnesyl diphosphate synthase (farnesyl 
. pyrophosphate synthetase, dimethylallyltranstransferase, 
geranyltranstransferase) {FDPS), mRNA 

The most preferred source of genes for the upper isoprenoid pathway 
in the present invention are the endogenous genes in E. coli MG1655. 

The Carotenoid Biosvnthetic Pathway - Lower Isoprenoid Pathway 

The division between the upper isoprenoid pathway and the lower 
carotenoid pathway is somewhat subjective. Because FPP synthesis is 
common in both carotenogenic and non-carotenogenic bacteria, the 
Applicants considers the first step in the lower carotenoid biosynthetic 
pathway to begin with the prenyltransferase reaction converting farnesyl 
pyrophosphate (FPP) to geranylgeranyl pyrophosphate (GGPP). The gene 
crtE, encoding GGPP synthetase, is responsible for this prenyltransferase 
reaction which adds IPP to FPP to produce the 20-carbon molecule GGPP. A 
condensation reaction of two molecules of GGPP occurs to form phytoene 
(PPPP), the first 40-carbon molecule of the lower carotenoid biosynthesis 
pathway. This enzymatic reaction is catalyzed by phytoene synthase. 

Lycopene, which imparts a "red n -colored spectra, is produced from 
phytoene through four sequential dehydrogenatioh reactions by the removal 
of eight atoms of hydrogen. This series of dehydrogenation reactions is 
catalyzed by phytoene desaturase. Intermediaries in this reaction are 
phytofluene, zeta-carotene, and neurosporene. 

Lycopene cyclase {crtY) converts lycopene to p-carotene. 

p.-carotene.is conyertedto zeaxanthin via a hydroxylatipn reaction 
resulting from the activity of p-carotene hydroxylase (encoded by the crtZ 
gene), p-cryptoxanthin is an intermediate in this reaction. 

p-carotene is converted to canthaxanthin by p-carotene ketolase 
(encoded by the crtWgene). Echinenone in an intermediate in this reaction. 
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Canthaxanthin can then be converted to astaxanthin by p-carotene 
hydroxylase (encoded by the crtZ gene). Adonbirubrin is an intermediate in 
this reaction. 

Zeaxanthin can be converted to zeaxanthin-p-diglucoside. This 
reaction is catalyzed by zeaxanthin glucosyl transferase (crfX). 

Zeaxanthin can be converted to astaxanthin by p-carotene ketolase 
encoded by a crtW or crtO gene. Adonixanthin is an intermediate in this 
reaction. 

Spheroidene can be converted to spheroidenone by spheroidene 
monooxygenase (encoded by crtA). 

Neurosporene can be converted to spheroidene and lycopene can be 
converted to spirilloxanthin by the sequential actions of hydroxyneurosporene 
synthase, methoxyneurosporene desaturase, and hydroxyneurosporene-O- 
methyltransferase encoded by the crtC, crtD and crtF genes, respectively. 

p-carotene can be converted to isorenieratene by p-carotene 
desaturase encoded by crtU . 

Genes encoding elements of the lower carotenoid biosynthetic 
pathway are known from a variety of plant, animal, and bacterial sources, as 
shown in Table 3. 

Table 3 

Sources of Genes Encoding the Lower Carotenoid Biosynthetic Pathway 



Gene 


Genbank Accession Number and 
Source Organism 


crtE (GGPP 
Synthase) 


AB000835, Arabidopsis thaliana 
AB016043 and AB019036, Homo sapiens 
AB016044, Mus musculus 
AB027705 and AB027706, Daucus carota 
AB034249, Croton sublyratus 
AB034250, Scopana dulcis 
AF020041 , Helianthus annuus 
AF049658, Drosophila melanogaster signal 
recognition particle 19kDa protein (srp19) gene,partial 
sequence; and geranylgeranyl pyrophosphate 
synthase (quemao) gene,complete cds 
AF049659, Drosophila melanogaster geranylgeranyl 
pyrophosphate synthase mRNA, complete cds 
AF 1 399 16, Brevibacten'um linens 
AF279807, Penicillium paxilli geranylgeranyl 
pyrophosphate synthase (ggsl) gene, complete 
AF279808 

Penicillium paxilli dimethylallyl tryptophan synthase 
(paxD) gene, partial cds;and cytochrome P450 
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• 


monooxygenase (paxQ), cytochrome P450 

monooxygenase (paxP),PaxC (paxC), 

monooxygenase (paxM), geranylgeranyl 

pyrophosphate synthase (paxG),PaxU (paxU), and 

metabolite transporter (paxT) genes, complete cds 

AJ01 0302, Rhodobacter sphaeroides 

AJ 133724, Mycobacterium aurum 

AJ276129, Mucor circinelloides f. lusitanicus carG 

gene for geranylgeranyl pyrophosphate synthase, 

exons 1-6 

D85029 

Arabidopsis thaliana mRNA for geranylgeranyl 

pyrophosphate synthase, partial cds 

L25813, Arabidopsis thaliana 

L37405, Streptomyces griseus geranylgeranyl 

pyrophosphate synthase (crfB), phytoene desaturase 

(crtE) and phytoene synthase (crtf) genes, complete 

cds 

U 15778, Lupinus albus geranylgeranyl 
pyrophosphate synthase (ggpsl) mRNA, complete 
cds 

uHHoi o, rtrauiuopsis inauana pregeranyigeranyi 
pyrophosphate synthase (GGPS2) mRNA, complete 
cds 

X92893, Croseus 
X95596, S.griseus 
X98795, S.alba 
Y151 12, Paracoccus marcusii 


criX (Zeaxanthin 
glucosylase) 


D90087, E. uredovora 

M87280 and M90698, Pantoea aqqlomerans 


c/t/ (Lycopene-0- 
cyclase) 


AF1 3991 6, Brevibacterium linens 
AF1 52246, Citrus x paradisi 
AF21841 5, Bradyriiizobium sp. ORS278 
AF272737, Streptomyces griseus strain I F0 13350 
AJ1 33724, Mycobacterium aurum 
AJ250827, Rhizomucor circinelloides f. lusitanicus 
carRP gene for lycopene cyclase/phytoene synthase, 
exons 1-2 * 
AJ276965, Phycomyces blakesleeanus carRA gene 
for phytoene synthase/lycopene cyclase, exons 1-2 
D58420, Agrobacterium aurantiacum 
D83513, Erythrobacterlongus 
L40176, Arabidopsis thaliana lycopene cyclase 
(LYC) mRNA, complete cds 
M87280, Pantoea agglomerans 
U50738, Arabodopsis thaliana lycopene epsilon 
cyclase mRNA, complete cds 
U50739 

Arabidosis thaliana lycopene p cyclase mRNA, 
complete cds 
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U62808, Flavobactenum ATCC21588 




X74599 




Synechococcus sp. Icy gene for lycopene cyclase 




X81787 




N.tabacum CrtL-1 gene encoding lycopene cyclase 




X86221, C.annuum 




X86452, Lesculentum mRNA for lycopene p-cyclase 




X95596, S.griseus 




X98796, N. pseudonarcissus 


crtl (Phytoene 


AB046992, Citrus unshiu CitPDSI mRNA for 


desaturase) 


phytoene desaturase, complete cds 




AF039585 




Zea mays phytoene desaturase (pdsl) gene promoter 




region and exon 1 




AF049356 




Oryza sativa phytoene desaturase precursor (Pds) 




mRNA, complete cds 




AF139916, Brevibacterium linens 




AF218415, Bradyrhizpbium sp. ORS278 




AF251014, Tagetes erecta 




AF364515, Citrus x paradisi 




D58420, Agrobacterium aurantiacum 




D83514, Erythrobacter longus 




L16237, Arabidopsis thaliana 




L37405, Streptomyces griseus geranylgeranyl 




pyrophosphate synthase (crfB), phytoene desaturase 




(crtE) and phytoene synthase {crtl) genes, complete 




cds 




L39266, Zea mays phytoene desaturase (Pds) 




mRNA, complete cds 




M64704, Soybean phytoene desaturase 




M88683, Lycopersicon esculentum phytoene 




desaturase (pds) mRNA, complete cds 




S71770, carotenoid gene cluster 




U37285, Zea mays 




U46919, Solanum lycopersicum phytoene desaturase 




(Pds) gene, partial cds 




U62808, Flavobactenum ATCC21 588 j 




X55289, Synechococcus pds gene for phytoene 




desaturase 




X59948, Lesculentum 




X62574, Synechocystis sp. pds gene for phytoene 




desaturase . 




X68058 




C.annuum pdsl mRNA for phytoene desaturase 




X71023 




Lycopersicon esculentum pds gene for phytoene 




desaturase 




X78271, Lesculentum (Ailsa Craig) PDS gene 




X78434, P.blakesleeanus (NRRL1555) carB gene 



21 



WO 2004/056974 



PCT/US2003/041811 





X78815, N. pseudonarcissus 




V00700 lj i. .. .:_ i:^. 

X86783, H. pluviahs 




Y14807, Dunaliella bardawil 




Y15007, Xanthophyllomyces dendrornous 




Y1 51 1 2, Paracoccus marcusu 




Y151 14, Anabaena PCC7210 crtP gene 




Z1 1 1 65, ft capsulatus 


crtB (Phytoene 


AB001284, Spirulma platensis 


synthase) 


AB032797, Daucus carofa PSY mRNA for phytoene 




synthase, complete cds 




AB034704, Rubnvivax gelatmosus 




AB037975, Citrus unshiu 




AF009954, Arabidopsis thaliana phytoene synthase 




(PSY) gene, complete cds 




AF1 3991 6, Brevibactenum linens 




AF1 52892, Crfnvs x paradisi 




AF218415, Bradyrhizobium sp. ORS278 




AF220218, Citrus unshiu phytoene synthase (Psy1) 




mRNA, complete cds 




AJ010302, Rhodobacter 




AJ 133724, Mycobacterium aurum 




AJ278287, Phycomyces blakesleeanus carRA gene 




for lycopene cyclase/phytoene synthase, 




AJ304825 




Helianthus annuus mRNA for phytoene synthase (psy 




gene) 




AJ308385 




Helianthus annuus mRNA for phytoene synthase (psy 




gene) 




D58420, Agrobacterium aurantiacum 




L23424 




Lycopersicon esculentum phytoene synthase (PSY2) 




mRNA, complete cds 




L258 1 2, Arabidopsis thaliana 




L37405, Streptomyces griseus geranylgeranyl 




pyrophosphate synthase (crtB), phytoene desaturase 




(crtE) and phytoene synthase (crtl) genes, complete 




cds 




M38424 


- 


Pantoea agglomerans phytoene synthase (crtE) 




gene, complete cds 




M87280, Pantoea agglomerans 




S71770, carotenoid gene cluster 




U32636 




Zea mays phytoene synthase (Y1) gene, complete 




cds 




U62808, Flavobacterium ATCC21588 




U87626, Rubiivivax gelatinosus 




U91 900, Dunaliella bardawil 




X52291 , Rhodobacter capsulatus 
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X60441 , L esculentum GTom5 gene for phytoene 

synthase 

X63873 

Synechococcus PCC7942 pys gene for phytoene 

synthase 

X68017 

C. annuum psyl mRNA for phytoene synthase 
X69172 

Synechocystis sp. pys gene for phytoene synthase 
X78814, N. pseudonarcissus 


crtZ (p-carotene 
hydroxylase) 


D58420, Agrobacterium aurantiacum 
D58422, Alcaligenes sp. 
D90087, £ uredovora 
M87280, Pantoea agglomerans 
U62808, Flavobacterium ATCC21588 
Y1 51 1 2, Paracoccus marcusii 


crtW (p-carotene 
ketolase) 


AF218415, Bradyrhizobium sp. ORS278 

D45881, Haematococcus pluvialis 

D58420, Agrobacterium aurantiacum 

D58422, Alcaligenes sp. 

X86782, H. pluvialis 

Y1 51 1 2, Paracoccus marcusii 


crtO (p-C4- 
ketolase) 


X86782, H. p/uwa//s 

Y1 51 1 2, Paracoccus marcusii 


crtU (p-carotene 
dehydrogenase) 


AF047490, Zea mays 

AF121947, Arabidopsis thaliana 

AF1 3991 6, Brevibacterium linens 

AF1 95507, Lycopersicon esculentum 

AF272737, Streptomyces griseus strain IFO13350 

AF372617, Citrus x paradisi 

AJ 133724, Mycobacterium aurum 

AJ224683, Narcissus pseudonarcissus 

D26095 and U38550, Anabaena sp. 

X89897, C. annuum 

Y15115, Anabaena PCC7210 crtQ gene 


crtA 

(spheroidene 
monooxygenase) 


AJ010302, Rhodobacter sphaeroides 

Z1 1 1 65 and X52291 , Rhodobacter capsulatus 


crtC 

(hydroxyneurospo 
rene synthase) 


AB034704, Rubrivivax gelatinosus 

AF195122 and AJ010302, Rhodobacter sphaeroides 

AF287480, Chlorobium tepidum 

U73944, Rubrivivax gelatinosus 

X52291 and Z11165, Rhodobacter capsulatus 

Z21955, M. xanthus 


crtD 

(carotenoid 3,4- 
desaturase) 


AJ010302 and X63204, Rhodobacter sphaeroides 

U73944, Rubrivivax gelatinosus 

X52291 and Z1 1 1 65, Rhodobacter capsulatus 


crtF 

(1-OH-carotenoid 


AB034704, Rubrivivax gelatinosus 
AF288602, Chloroflexus aurantiacus 
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methylase) 



AJ01 0302, Rhodobacter sphaeroides 
X52291 and Z11165, Rhodobacter capsulatus 



The most preferred source of genes for the lower carotenoid biosynthetic 
pathway in the present invention are from Pantoea stewartii (ATCC No. 
8199). Sequences of these preferred genes are presented as the following 
SEQ ID numbers: the crtE gene (SEQ ID NO:1), the crtXgene (SEQ ID 

5 NO:3), crtY (SEQ ID NO:5), the crtl gene (SEQ ID NO:7), the crtB gene (SEQ 
ID NO:9) and the crtZ gene (SEQ ID NO:11). 
Gene Mutations 

The invention relates to the discovery that certain mutations of 
chromosomal genes unexpectedly resulted in the increased production of 

10 carotenoids. Several of the mutations were complete gene disruptions 
whereas others were mutations in the carboxyl end of essential genes that 
resulted in an alteration, but not complete loss of gene function. Genes 
having complete disruptions included the cfeaD, mreC, and yfhE genes. 
Genes where only partial function was lost included the thrS, rpsA, rpoC, 

15 yjeR, and rhoL genes. 

In the case where the disruptions occur in the cfeaD, mreC and yfhE 
genes, the elements of the upper and lower isoprenoid pathway may be either 
integrated into the cell genome or present, in whole or in part, on an 
autonomously replicating plasmid. However, in the case of the partial 

20 mutations in the thrS, rpsA, rpoC, yjeR, and rhoL genes, it is essential to the 
» invention that genes belonging to the lower isoprenoid pathway (needed for 
the production of the desired carotenoid compound) be present on a plasmid 
and that plasmid be antisense RNA regulated as is the case with plasmids 
having the p15A and pMB1 replicons. 

25 The copy number of two types of ColE1 plasmids, p15A and pMB1 

derived replicons, is regulated by the antisense mechanism (Tomizawa, J., 
Cell, 38:861-870 (1984)). A transcript (RNA II) from the ColE1 primer 
promoter forms a persistent hybrid with the template DNAnear the replication 
origin. The hybridized RNA II is cleaved by RNAase H to form the primer for 

30 replication initiation. Binding of the antisense RNA (RNA I) to RNA II inhibits 
the hybridization and thus prevents primer formation for replication. Rop is a 
small protein that when bound to both RNA molecules, increases the stability 
of the RNA II RNA II complex, thus decreasing the likelihood of replication. 
Methods of constructing plasmids suitable in the present invention are 

35 common and well known in the art (Sambrook et al., supra). For example, 
typically the vector or cassette contains sequences directing transcription and 
translation of the relevant gene, a selectable marker, and sequences allowing 
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autonomous replication or chromosomal integration. Suitable vectors 
comprise a region 5' of the gene which harbors transcriptional initiation 
controls and a region 3* of the DNA fragment which controls transcriptional 
termination. It is most preferred when both control regions are derived from 

5 genes homologous to the transformed host cell, although it is to be 

understood that such control regions need not be derived from the genes 
native to the specific species chosen as a production host. 

Initiation control regions or promoters, which are useful to drive 
expression of the instant ORF's in the desired host cell, are numerous and 

10 familiar to those skilled in the art. Virtually any promoter capable of driving 
these genes is suitable for the present invention including but not limited to 
CYC1, HIS3, GAL1, GAL10, ADH1, PGK, PH05, GAPDH, ADC1, TRP1, 
URA3 t LEU2, ENO, TPI (useful for expression in Saccharomyces); AOX1 
(useful for expression in Pichia); and lac, ara, tet, trp, IP Ll IP R , T7, tac, and trc 

15 (useful for expression in Escherichia coli) as well as the amy, apr, npr 
promoters and various phage promoters useful for expression in Bacillus. 

Termination control regions may also be derived from various genes 
native to the preferred hosts. Optionally, a termination site may be 
unnecessary, however, it is most preferred if included. 

20 Similarly methods of making the present mutations are common and 

well known in the art and any suitable method may be employed. For 
example, where sequence of the gene to be mutated is known, one of the 
most effective methods gene down regulation is targeted gene disruption 
where foreign DNA is inserted into a structural gene so as to disrupt 

25 transcription. This can be effected by the creation of genetic cassettes 
comprising the DNA to be inserted (often a genetic marker) flanked by 
sequence having a high degree of homology to a portion of the gene to be 
disrupted. Introduction of the cassette into the host cell results in insertion of 
the foreign DNA into the structural gene via the native DNA replication 

30 mechanisms of the cell. (See for example Hamilton et al., J. Bacterid., 

171:4617-4622 (1989), Balbas et al., Gene, 136:211-213 (1993), Gueldener 
et al., Nucleic Acids Res., 24:2519-2524 (1996), and Smith et al., Methods 
Mol. Cell. Biol., 5:270-277 (1 996)). 

Antisense technology is another method of down regulating genes 

35 where the sequence of the target gene is" known. To accomplish this, a 

nucleic acid segment from the desired gene is cloned and operably linked to a 
promoter such that the anti-sense strand of RNA will be transcribed. This 
construct is then introduced into the host cell and the antisense strand of RNA 

25 
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is produced. Antisense RNA inhibits gene expression by preventing the 
accumulation of mRNA which encodes the protein of interest. The person 
skilled in the art will know that special considerations are associated with the 
use of antisense technologies in order to reduce expression of particular 
5 genes. For example, the proper level of expression of antisense genes may 
require the use of different chimeric genes utilizing different regulatory 
elements known to the skilled artisan. 

Although targeted gene disruption and antisense technology offer 
effective means of down regulating genes where the sequence is known, 

10 other less specific methodologies have been developed that are not 

sequence based. For example, cells may be exposed to a UV radiation and 
then screened for the desired phenotype. Mutagenesis with chemical agents 
is also effective for generating mutants and commonly used substances 
include chemicals that affect non-replicating DNA such as HN0 2 and NH 2 OH, 

is as well as agents that affect replicating DNA such as acridine dyes, notable 
for causing frameshift mutations. Specific methods for creating mutants using 
radiation or chemical agents are well documented in the art. See for example 
Thomas D. Brock in Biotechnology: A Textbook of Industrial Microbiology . 
Second Edition (1989) Sinauer Associates, Inc., Sunderland, MA., or 

20 Deshpande, Mukund V., AppL Biochem. BiotechnoL, 36, 227, (1992). 
Another non-specific method of gene disruption is the use of 
transposable elements or transposons. Transposons are genetic elements 
that insert randomly in DNA but can be latter retrieved on the basis of 
sequence to determine where the insertion has occurred. Both in vivo and in 

25 vitro transposition methods are known. Both methods involve the use of a 
transposable element in combination with a transpdsase enzyme. When the 
transposable element or transposon, is contacted with a nucleic acid fragment 
in the presence of the transposase, the transposable element will randomly 
insert into the nucleic acid fragment. The technique is useful for random 

30 mutagenesis and for gene isolation, since the disrupted gene may be 

identified on the basis of the sequence of the transposable element. Kits for 
in vitro transposition are commercially available (see for example The Primer 
Island Transposition Kit, available from Perkin Elmer Applied Biosystems, 
Branchburg, NJ, based upon the yeast Ty1 element; The Genome Priming 

35 System, available from New England Biolabs, Beverly, MA; based upon the 
bacterial transposon Tn7; and the EZ::TN Transposon Insertion Systems, 
available from Epicentre Technologies, Madison, Wl, based upon the Tn5 
bacterial transposable element). 
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In the context of the present invention, random mutagenesis was 
performed using EZ:TN™ <KAN-2>Tnp Transposome™ kit (Epicentre 
Technologies, Madison, Wl). Eight chromosomal mutations were isolated 
that increased p-carotene production in £ coli. These included Tn5 
5 insertions in three non-essential genes (cfeaD, mreC, hscB) that likely 
disrupted their functions, and Tn5 insertions in the carboxyl end of five 
essential genes (thrS, rpsA, rpoC, yjeR, rhoL) that likely altered their 
functions. 

Carotenoid Production 

10 The mutations described by the present invention are in housekeeping 

genes. Since transcription, translation and protein biosynthetic apparatus is 
the same irrespective of the microorganisms and the feedstock, these 
mutations are likely to have similar effect in many host strains that can be 
used for carotenoid production including, but are not limited to, fungal or yeast 

15 species such as Aspergillus, Trichoderma, Saccharomyces, Pichia, Candida, 
Hansenula, or bacterial species such as Salmonella, Bacillus, Acinetobacter, 
Zymomonas, Agrobacterium, Erythrobacter Chlorobium, Chromatium, 
Flavobacterium, Cytophaga, Rhodobacter, Rhodococcus, Streptomyces, 
Brevibacterium, Corynebacteria, Mycobacterium, Deinococcus, Escherichia, 

20 Erwinia, Pantoea, Pseudomonas, Sphingomonas, Methylomonas, 
Methylobacter, Methylococcus, Methylosinus, Methylomicrobium, 
Methylocystis, Methylobacterium, Alcaligenes, Synechocystis, 
Synechococcus, Anabaena, Thiobacillus, Methanobacterium, Klebsiella, 
Myxococcus, and Staphylococcus. 

25 Large-scale microbial growth may utilize a fermentable carbon 

substrate covering a wide range of simple or complex carbohydrates, organic 
acids and alcohols, and/or saturated hydrocarbons such as methane or 
carbon dioxide in the case of photosynthetic or chemoautotrophic hosts. 
Carotenoids produced in the hosts include, but not limited to, antheraxanthin, 

30 adonixanthin, astaxanthin, canthaxanthin, capsorubrin.p-cryptoxanthin, 

didehydrolycopene, didehydrolycopene, p-carotene, <;~carotene, 5-carotene, 
y-carotene, keto-y-carotene, \|/-carotene, e-carotene, p,\|/-carotene, 
torulene, echinenone, gamma-carotene, zeta-carotene, alpha-cryptoxanthin, 
diatoxanthin, 7,8-didehydroastaxanthin, fucoxanthin, fucoxanthinol, 

35 isorenieratene, p-isorenieratene, lactucaxanthin, lutein, lycopene, neoxanthin, 
neurosporene, hydroxyneurosporene, peridinin, phytoene, rhodopin, rhodopin 
glucoside, siphonaxanthin, spheroidene, spheroidenone, spirilloxanthin, 
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uriolide, uriolide acetate, violaxanthin, zeaxanthin-p-diglucoside, zeaxanthin, 

and C30-carotenoids. 

Description of the Preferred Embodiments 

Using random transposon mutagenesis, several mutations to non- 
5 isoprenoid/carotenoid biosynthetic pathway genes have been discovered. 
These mutations serve to increase production of p-carotene in an E. coli 
strain harboring a reporter plasmid expressing genes involved in carotenoid 
biosynthesis. 

In one embodiment, the Pantoea stewartii. (ATCC No. 8199) crtEXYIB 

10 gene cluster was cloned into a vector, creating reporter plasmid pPCB1 5 
(Examples 1 and 3; Figure 5; SEQ ID NO. 43). Identification of the individual 
genes was verified by sequence analysis (Example 2, Table 4). Plasmid 
pPCB15 was transformed into £. coli MG 1655, creating a strain capable of p- 
carotene production. The level of p-carotene production in E. coli MG 1655 

15 (pPCB15) was used as the control. 

In another embodiment, chrosomomal transposon mutagenesis was 
done on the E. coli strain containing pPCB15 (Example 3; Figure 2). 
Resulting strains that developed a deeper yellow color in comparison to the 
control strain were selected and analyzed (Example 4; Figures 2 and 3). 

20 Three mutant strains (Y1 , Y8, and Y1 2) exhibited a 2.5-3.5 fold increase in 
production of p-carotene while mutants Y4, Y15, Y16, Y17, and Y21 showed 
a 1 .5-2.0 fold increase. 

In another embodiment, the transposon insertion sites on the E. coli 
chromosome were mapped and confirmed using PCR fragment analysis 

25 (Examples 5 and 6, Table 5, Figure 4). In a preferred embodiment, the 
identified mutant genes containing a Tn5 insertion are selected from the 
group consisting of thrS (SEQ ID NO. 35), cfeaD (SEQ ID NO. 36), rpsA (SEQ 
ID NO. 37), rpoC (SEQ ID NO. 38), yjeR (SEQ ID NO: 39), mreC (SEQ ID 
NO. 40), rhoL (SEQ ID NO. 41), and hscB(yfhE) (SEQ ID NO. 42). 

30 In another embodiment, a mutated gene selected from one of SEQ ID 

NOs: 35-42 is engineered into a carotenoid producing microorganism (one 
naturally possessing the isoprenoid/carotenoid pathway or one that had the 
pathway engineered by recombinant technology) to increase carotenoid 
production. In a preferred embodiment, two or more of the mutant genes are 

35 incorporated into a carotenoid producing microorganism to optimize 
carotenoid production. In a more preferred embodiment, the carotenoid 
producing microorganism is a recombinantly modified E. coli strain. 
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Several strains of E. coli capable of increased carotenoid production 
have been created. Mutations to genes not considered part of either the 
isoprenoid or carotenoid biosynthetic pathways were created, mapped, and 
sequenced. These novel mutant sequences can be used alone or in 
5 combination with others to create strains of E. coli exhibiting enhanced 
carotenoid production, 

EXAMPLES 

The present invention is further defined in the following Examples. It 
should be understood that these Examples, while indicating preferred 

10 embodiments of the invention, are given by way of illustration only. From the 
above discussion and these Examples, one skilled in the art can ascertain the 
essential characteristics of this invention, and without departing from the spirit 
and scope thereof, can make various changes and modifications of the 
invention to adapt it to various usages and conditions. 

15 GENERAL METHODS 

Standard recombinant DNA and molecular cloning techniques used in 
the Examples are well known in the art and are described by Sambrook, J., 
Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual', Cold 
Spring Harbor Laboratory Press: Cold Spring Harbor, (1989) (Maniatis) and 

20 by T. J. Silhavy, M. L Bennan, and L. W. Enquist, Experiments with Gene 
Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1984) and 
by Ausubel, F. M. et al., Current Protocols in Molecular Biology, pub. by 
Greene Publishing Assoc. and Wiley-lnterscience (1987). 

Materials and methods suitable for the maintenance and growth of 

25 bacterial cultures are well known in the art. Techniques suitable for use in the 
following examples may be found as set out in Manual of Methods for 
General Bacteriology (Phillipp Gerhardt, R. G. E. Murray, Ralph N. Costilow, 
Eugene W. Nester, Willis A. Wood, Noel R. Krieg and G. Briggs Phillips, eds), 
American Society for Microbiology, Washington, DC. (1994)) or by Thomas D. 

30 Brock in Biotechnology: A Textbook of Industrial Microbiology . Second 
Edition, Sinauer Associates, Inc., Sunderland, MA (1989). All reagents, 
restriction enzymes, and materials used for the growth and maintenance of 
bacterial cells were obtained from Aldrich Chemicals (Milwaukee, Wl), DIFCO 
Laboratories (Detroit, Ml), GIBCO/BRL (Gaithersburg, MD), or Sigma 

35 Chemical Company (St. Louis, MO) unless otherwise specified. 

Manipulations of genetic sequences were accomplished using the suite 
of programs available from the Genetics Computer Group Inc. (Wisconsin 
Package Version 9.0, Genetics Computer Group (GCG), Madison, Wl). 
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Where the GCG program "Pileup" was used the gap creation default value of 
12, and the gap extension default value of 4 were used. Where the CGC 
"Gap" or "Bestfif programs were used the default gap creation penalty of 50 
and the default gap extension penalty of 3 were used. Multiple alignments 

5 were created using the FASTA program incorporating the Smith-Waterman 
algorithm (W. R. Pearson, Comput Methods Genome Res., [Proc. Int. Symp.] 
(1994), Meeting Date 1992, 111-20. Editor(s): Suhai, Sandor. Publisher: 
Plenum, New York, NY). In any case where program parameters were not 
prompted for, in these or any other programs, default values were used. 

10 The meaning of abbreviations is as follows: "h" means hour(s), M min" 

means minute(s), "sec" means second(s), "d" means day(s), VL" mean 
microliters, "mL" means milliliters, and "L" means liters. 

EXAMPLE 1 

15 Cloning of ft-Carotene Production Genes from Pantoea stewartii 

Primers were designed using the sequence from Erwinia uredovora to 
amplify a fragment by PCR containing the crt genes. These sequences 
included 5!-3': 

20 ATGACGGTCTGCGCAAAAAAACACG < SEQ ID 13 

GAGAAATTATGTTGTGGATTTGGAATGC SEQ ID 14 

Chromosomal DNA was purified from Pantoea stewartii (ATCC no. 8199) and 
Pfu Turbo polymerase (Stratagene, La Jolla, CA) was used in a PCR 

25 amplifcation reaction under the following conditions: 94°C, 5 min; 94°C 
(1 min)-60°C (1 min)-72°C (10 min) for 25 cycles, and 72°C for 10 min. A 
single product of approximately 6.5 kb was observed following gel 
electrophoresis. Taq polymerase (Perkin Elmer, Foster City, CA) was used in 
a ten minute 72°C reaction to add additional 3' adenosine nucleotides to the 

30 fragment for TOPO cloning into pCR4-TOPO (Invitrogen, Carlsbad, CA) to 
create the plasmid pPCB13. Following transformation to E. coli DH5a (Life 
Technologies, Rockville, MD) by electroporation, several colonies appeared to 
be bright yellow in color indicating that they were producing a carotenoid 
compound. Following plasmid isolation as instructed by the manufacturer 

35 using the Qiagen (Valencia, CA) miniprep kit, the plasmid containing the 
6.5 kb amplified fragment was transposed with pGPS1 .1 using the GPS-1 
Genome Priming System kit (New England Biolabs, Inc., Beverly, MA). A 
number of these transposed plasmids were sequenced from each end of the 
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transposon. Sequence was generated on an ABI Automatic sequencer using 
dye terminator technology (US 5366860; EP 272007) using transposon 
specific primers. Sequence assembly was performed with the Sequencher 
program (Gene Codes Corp., Ann Arbor, Ml). 

5 EXAMPLE 2 

Identification and Characterization of Pantoea stewartii Genes 
Genes encoding crtE, X, Y, I, B, and Z clooecLfrom Pantoea stewartii 
were identified by conducting BLAST (Basic Local Alignment Search Tool; 
Altschul, S. F., et al., J. Mol. Biol. 215:403-410 (1993)searches for similarity to 

10 sequences contained in the BLAST "nr" database (comprising all non- 
redundant GenBank® CDS translations, sequences derived from the 
3-dimensional structure Brookhaven Protein Data Bank, the SWISS-PROT 
protein sequence database, EMBL, and DDBJ databases). The sequences 
obtained were analyzed for similarity to all publicly available DNA sequences 

is contained in the "nr" database using the BLASTN algorithm provided by the 
National Center for Biotechnology Information (NCBI). The DNA sequences 
were translated in all reading frames and compared for similarity to all publicly 
available protein sequences contained in the "nr" database using the BLASTX 
algorithm (Gish, W. and States, D. J., Nature Genetics, 3:266-272 (1993)) 

20 provided by the NCBI . 

All comparisons were done using either the BLASTNnr or BLASTXnr 
algorithm. The results of the BLAST comparison is given in Table 4 which 
summarize the sequences to which they have the most similarity. Table 4 
displays data based on the BLASTXnr algorithm with values reported in . 

25 expect values. The Expect value estimates the statistical significance of the 
match, specifying the number of matches, with a given score, that are 
expected in a search of a database of this size absolutely by chance. 
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Geranylgeranyl pryophosphate synthetase (or GGPP 
synthetase, or famesyltranstransferase) 

EC 2.5.1.29 


gi|117509|sp|P21684|CRTE PANAN 
GERANYLGERANYL PYROPHOSPHATE . 
SYNTHETASE (GGPP SYNTHETASE) 
(FARNESYLTRANSTRANSFERASE) 


Zeaxanthm glucosyl transferase EC 2.4.1 .- 


gi|1073294|pir||S52583 crtX protein - Erwinia 
herbicola 


Lycopene cyclase 


gi| 1 073295|pir||S52585 lycopene cyclase - Erwinia 
herbicola 


Phytoene desatuiaseEC 1.3.-.- 


gi|1073299|pir||S52586 phytoene dehydrogenase (EC 
1 .3.-.-) - Erwinia herbicola 


Gene 
Name. 


crtE 


crtX 


crtY : 


crti ; 


ORF 
Name 






CM 


CO 





32 



WO 2004/056974 



PCT/US2003/041811 



CD . 

5i 



CD S 



o 
in 



00 
00 



CD 



GO 
o 



LO 



CM 



I g CM O 
CO CQ ^- CO 



o 



00 



CO 



CD 



00 
CO 



CM 



CO 

o 

I 

CO 

■4— » 

0) 
CO 



LLl 
CO 



0- uj 

CO |m 

h- O 

E5 

D)0 



CD 



CO C 
C c = 

a> Q a> 

£88 

1 S 8 

CT3c 

■8 •! 
£ -2 35 

±i o CD 

.c co co c 
*- *o ^ CO 

w o iS -c 

2 (a « o 

" O 0) ^ 

Q c ^ >* 

C C W Q) 

E <° jB 

<l> fc W 

Is si 

c o rC eg 

tjl T3 I— 
CD M *0 

w — a> 

CO 3 CO 

co -a o.i» co 



33 



WO 2004/056974 



PCT/US2003/041811 



EXAMPLE 3 

Isolation of Chromosomal Mutations that Increase Carotenoid Production 
Wild type E. coli is non-carotenogenic and synthesizes only the 
farnesyl pyrophosphate precursor for carotenoids. When the crtEXYIB gene 
cluster from Pantoea ste wartii was introduced into E.co//, p-carotene was 
synthesized and the cells exhibit a yellow color characteristic of p-carotene. 
E. coli chromosomal mutations which increase carotenoid production should 
result in colonies that are more intensely pigmented or show deeper yellow in 
color (Figure 2). 

The plasmid pPCB15 (cam R )(SEQ ID NO. 43) encodes the carotenoid 
biosynthesis gene cluster (crtEXYIB) from Pantoea Stewartii (ATCC no. 
8199). The pPCB15 plasmid was constructed from ligation of Sma\ digested 
pSU18 (Bartolome et al., Gene, 102:75-78 (1991)) vector with a blunt-ended 
PmeUNotl fragment carrying crtEXYIB from pPCB13 (Example 1). E. coli 
MG1655 transformed with pPCB15 was used fortransposon mutagenesis. 
Mutagenesis was performed using EZ:TN™ <KAN-2>Tnp Transposome™ kit 
(Epicentre Technologies, Madison, Wl) according to manufacture's 
instructions. A 1 nL volume of the transposome was electroporated into 50 
liL of highly electro-competent MG1655(pPCB15) cells. The mutant cells were 
spread onto LB-Noble Agar (Difco laboratories, Detroit, Ml) plates with 25 
jig/mL kanamycin and 25 ng/mL chloramphenicol, and grown at 37°C 
overnight. Tens of thousands of mutant colonies were visually examined for 
production of increased levels of p-carotene as evaluated by deeper yellow 
color development. The candidate mutants were re-streaked to fresh LB- 
Noble agar plates and glycerol frozen stocks made for further 
characterization'. 

EXAMPLE 4 
Quantitation ofCarotenoid Production 
To confirm that the mutants selected for increased production p- 
carotene by visually screening for deeper yellow colonies in Example 3 
indeed produced more p-carotene, the carotenoids were extracted from 
cultures grown from each mutant strain and quantified spectrophotometrically. 
Each candidate mutant strain was cultured in 10 mL LB medium with 25 
^ig/mL chloramphenicol in 50 mL flasks overnight shaking at 250 rpm, 
MG1655(pPCB15) was used as the control. Carotenoids were extracted from 
each cell pellet for 15 min into 1 mL acetone, and the amount of p-carotene 
produced was measured at 455 nm. Cell density was measured at 600 nm. 
The ratio OD455/OD600 was used to normalize p-carotene production for 
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different cultures, p-carotene production was also verified by HPLC. Among 
all the mutant clones tested, eight showed increased p-carotene production. 
The averages of three independent measurements with standard deviations 
were calculated and are indicated in Figure 3. Mutants Y1, Y8 and Y12 
showed 2.5-3.5 fold increase in production of p-carotene. Mutants Y4, Y15, 
Y16, Y17 and Y21 showed 1.5-2 fold increase in production of p-carotene. 

EXAMPLE 5 

Mapping of the Transposon Insertions on the E. coli Chromosome 
The transposon insertion site in each mutant was identified by PCR 
and sequencing directly from chromosomal DNA of the mutant strains. A 
modified single-primer PCR method (Karlyshev et al., BioTechniques, 
28:1078-82, 2000) was used. For this method, a 100 jxL volume of overnight 
culture was heated at 99°C for 10 min in a PCR machine. Cell debris was 
removed by centrifugation at 4000 g for 10 min. A 1 volume of 
supernatant was used in a 50 PCR reaction using either Tn5PCRF (5- 
GCTGAGTTGAAGGATCAGATC-3';SEQ ID NO:15) or Tn5PCRR (5'- 
CGAGCAAGACGTTTCCCGTTG-3';SEQ ID NO: 16) primer. PCR was carried 
out as follows: 5 min at 95°C; 20 cycles of 92°C for 30 sec, 60°C for 30 sec, 
72°C for 3 min; 30 cycles of 92°C for 30 sec, 40°C for 30 sec, 72°C for 2 min; 
30 cycles of 92°C for 30 sec, 60°C for 30 sec, 72°C for 2 min. A 1 0 jxL 
volume of each PCR product was electrophoresed on an agarose gel to 
evaluate product length. A 40 \xL volume of each PCR product was purified v 
using the Qiagen PCR cleanup kit, and sequenced using sequencing primers 
Kan-2 FP-1 (5^ACCTACMCAMGCTCTCATCAACC-3';SEQ ID NO:17)or 
Kan-2 RP-1 (S'-GCAATGTAACATCAGAGATTTTGAG-S'jSEQ ID NO:18) 
provided by the EZ:TN™ <KAN-2>Tnp Transposome™ kit. The . 
chromosomal insertion site of the transposon was identified as the junction 
between the Tn5 transposon and MG1655 chromosome DNA by aligning the 
sequence obtained from each mutant with the E. coli genomic sequence of 
MG1655 (GenBank® Accession number U00096). Table 5 summarizes the 
chromosomal insertion sites of the mutants that showed increased carotenoid 
production. The numbers refer to the standard base pair (bp) numbers in the 
E. coli genome. The majority of the harboring transposons are involved in 
transcription, translation or RNA stability. Five of the insertion sites (thrS, 
rpsA, rpoC, yjeR, and rhoL) were previously reported to be essential for 
viability of the E. coli cell. The transposon insertions we obtained in these 
five genes (thrS, rpsA, rpoC , yjeR, and rhoL) were located very close to the 
carboxyl terminal end of the gene and most likely resulted in functional 
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although truncated proteins. The genes affected in another set of five 
mutants {thrS, rpoC, mr&C, rhoL, and hscB) were part of demonstrated or 
predicted operons. Figure 4 shows the neighborhood organization of the 
genes containing the transposon insertions. 

Table 5 

Localization of the transposon insertions in E. coli chromosome 



Mutant 


Transposon 
insertion 
Site 


Gene 
disrupted 


Function 


Operon 


Essential 
gene 


Reference 


Y1 


1 798679 


till o. 

1798666- 
1800594 


thrpnnvl- 

tRNA 
synthetase 


thrS- 

U II W - 

infC- 
rpml- 
rpIT 


Yf»«? 
I CO 


Jnhn«5fin P.l 1Q77 

J Bacteriol 129:66-70 ! 


Y4 


3304788 


deaD: 

3303612- 

3305552 


RNA helicase 




No 


Toone WM, 1991 

J Bacteriol 1 73:3291 -302 


TO 


yb/:o1o 


rpsA. 

961218- 

962891 


oUo 

ribosomal 
subunit 
protein S1 




Yes 


Kitakawa M, 1982 

Mol Gen Genet 185:445-7 


Y12 


4187062 


rpoC: 

4182928- 

4187151 


RNA 

polymerase 
P' subunit 


rpoB- 
rpoC 


Yes 


Post,LE, 1979 

Proc Natl Acad Sci USA. 

76:1697-1701 


Y15 


4389704 


yjeR: 

*t0057 1 1 O** 

4389727 


oligo- 

riHnni ir*lo9CA 




Yes 


Ghosh S, 1999 

Dmr* Mail Anesrl Qr*S 1 IQA 

96:4372-7. 


Y16 


3396592 


mreC: 

3396512- 

3397615 


rod shape- 
determining 
protein 


mreB- 
mreC- 
mreD 


No 


Wachl M, 1987 

J Bacteriol 169:4935-40 


Y17 


3963892 


rhoL: 

3963846- 

3963947 


rho operon 

leader 

peptide 


rhoL- 
rho 


Yes 


Das A, 1976 

Proc Natl Acad Sci USA. 
73:1959-63 


Y21 


2657233 


yfhE 
(hscB): 
2656972- 
2657487 


heat shock 

cognate 

protein 


hscB- 
hscA- 
fdx- 
yfhJ 


Unknown 


TakahashiY, 1999 

J Biochem (Tokyo)126:917- 

26 
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EXAMPLE 6 

Confirmation of Transposon Insertions in E. coli Chromosome 
To confirm the transposon insertion sites in Example 5, chromosome 
specific primers were designed 400-800 bp upstream and downstream from 
the transposon insertion site for each mutant. The list of the primer 
sequences is summarized in Table 6. Three sets of PCR reactions were 
performed for each mutant. The first set (named as PCR 1) uses a 
chromosome specific upstream primer paired with a chromosome specific 
downstream primer. The second set (PCR 2) uses a chromosome specific 
upstream primer paired with a transposon specific primer (either Kan-2 FP-1 
or Kan-2 RPt1, depending on the orientation of the transposon in the 
chromosome). The third set (PCR 3) uses a chromosome specific 
downstream primer paired with a transposon specific primer. PCR conditions 
are: 5 min at 95°C; 30 cycles of 92°C for 30 sec, 55°C for 30 sec, 72°C for 1 
min; then 5 min at 72°C. Wild type MG1655(pPCB15) cells served as control 
cells. For the control cells, the expected wild type bands were detected in 
PCR1 , and no mutant band was detected in PCR2 or PCR3. For all the eight 
mutants, no wild type bands were detected in PCR1 , and the expected 
mutant bands were detected in both PCR2 and PCR3. The size of the 
products in PCR2 and PCR3 correlated well with the insertion sites in each 
specific gene. Therefore, the mutants contained the transposon insertions as 
mapped in Table 5. They were most likely responsible for the phenotype of 
increased carotenoid production in each of the mutants. 

TABLE 6 



List of chromosome specific primers used for mu 



Primer 


Sequence 


SEQ ID NO 


Y1 F 


5'-agcaccatgatcatctggcg-3' 


. 19 


Y1 R 


S'-cggttgcgctggaagaaaac-S' 


' 20 


Y4 F 


S'-caccctglgccattttcagc-a 1 


21 


Y4 R 


S'-cgttctgggtatggcccaga-S 1 


22 


Y8 1 F 1 


5'-aaagctaacccgtggcagca-3' 


23 


Y8 1 R 


5 , -tttgcgttccccgaggcata-3 , 


24 


Y12 F 


5'-ttccgaaatggcgtcagctc-3' 


25 - 


Y12 R 


5 , -atctctacattgattatgagtattc-3 , 


26 


Y15 F 


5 , -ggatcgatcttgagatgacc-3 t 


27 


Y15 R 


S'^ctttcgtaattttcgcatttctg-S' 


28 
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Y16 F 


S'-cacgccaagttgcgcaagta-S* 


29 


Y16 R 


5'-gcagaaaatggtgactcagg-3' 


30 ! 


Y17 F 


S'-ggcgatcctcgtcgatttct-S' 


31 


Y17 R 


5'-acgcagacgagagtttgcgt-3' 


32 


Y21 F 


5'-accgaatgcccttgctgttg-3' 


33 


Y21 R 


S-gggtgttcaggtatggctta-a' 


34 
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CLAIMS 

What is claimed is: 

1 . A carotenoid overproducing microorganism comprising the 
genes encoding a functional isoprenoid enzymatic biosynthetic pathway 
comprising a disrupted gene selected from the group consisting of deaD, 
mreC, and yfhE. 

2. The carotenoid overproducing microorganism of Claim 1 
wherein the isoprenoid enzymatic biosynthetic pathway comprises: 

a) an upper isoprenoid enzymatic biosynthetic pathway comprising 
the genes dxs, dxr, ygbP, ychB, ygbB, lytB, /of/, ispA, and ispB; 
and 

b) a lower isoprenoid enzymatic biosynthetic pathway comprising 
the genes crtE, crtB, crtl, and crtY. 

3. The carotenoid overproducing microorganism of Claim 2 
wherein the lower pathway optionally comprises genes selected from the 
group consisting of crtZ and crtW 

4. The carotenoid overproducing microorganism of any of Claims 
1 -3 or wherein the microorganism is selected from the group consisting of 
bacteria, yeasts and filamentous fungi. 

5. The carotenoid overproducing microorganism of Claim 4 
wherein the microorganism is selected from the group consisting Aspergillus, 
Trichoderma, Saccharomyces, Pichia, Candida, Hansenula, Salmonella, 
Bacillus, Acinetobacter, Zymomonas, Agrobacterium, Erythrobacter, 
Chlorobium, Chromatium, Flavobacterium, Cytophaga, Rhodobacter, 
Rhodococcus, Streptomyces, Brevibacterium, Corynebacteria, 
Mycobacterium, Deinococcus, Paracoccus, Escherichia, Envinia, Pantoea, 
Pseudomonas, Sphingomonas, Methylomonas, Methylobacter, 
Methylococcus, Methylosinus, Methylomicrobium, Methylocyst/Sj Alcaligenes, 
Synechocystis, Synechococcus, Anabaena, Thiobacillus, Staphylococcus 
Methanobacterium, Klebsiella, and Myxococcus. 
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6. The carotenoid overproducing microorganism of Claim 5 
wherein the microorganism is E. coli. 

r 

7. The carotenoid overproducing microorganism of either of Claims 
2 or 3 wherein the lower pathway genes reside on an autonomously 
replicating plasmid. 

8. The carotenoid overproducing microorganism of Claim 7 
wherein the autonomously replicating plasmid comprises a replicon selected 
from the group consisting of p15A and pMB1 . 

9. The carotenoid overproducing microorganism of either of Claims 
2 or 3 wherein the lower pathway genes are chromosomally integrated. 

10. A carotenoid overproducing microorganism according to Claim 1 
wherein the microorganism is E. coli and wherein the disrupted deaD gene 
has the sequence as set forth in SEQ ID NO: 36, the disrupted mreC gene 
has the sequence as set forth in SEQ ID NO: 40 and the disrupted yfhE has 
the sequence as set forth in SEQ ID NO: 42. 

1 1 . The carotenoid overproducing microorganism according to claim 
10 optionally comprising mutations selected from the group consisting of: a 
mutation in the thrS gene as set forth in SEQ ID NO: 35, a mutation in the * 
rpsA gene as set forth in SEQ ID NO: 37, a mutation in the rpoC gene as set 
forth in SEQ ID NO: 38, a mutation in the yjeR gene as set forth in SEQ ID 
NO: 39, and a mutation in the rhoL gene as set forth in SEQ ID NO: 41 . 

12. A carotenoid overproducing E. coli comprising: 

a) an upper isoprenoid enzymatic biosynthetic pathway comprising 
the genes dxs, dxrygbP, ychB, ygbB, lytB, idi, ispA, and ispB; 

b) a lower isoprenoid enzymatic biosynthetic pathway comprising 
the genes crtE, crtB, crtl, and crtY; 

c) mutations selected from the group consisting of: a mutation in 
the thrS gene as set forth in SEQ ID NO: 35, a mutation in the 
rpsA gene as set forth in SEQ ID NO: 37, a mutation in the 
rpoC gene as set forth in SEQ ID NO: 38, a mutation in the yjeR 
gene as set forth in SEQ ID NO: 39, and a mutation in the rhoL 
gene as set forth in SEQ ID NO: 41; 
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wherein the genes of the lower isoprenoid enzymatic biosynthetic 
pathway reside on an autonomously replicating plasmid comprising a replicon 
selected from the group consisting of p15A and pMB1. 

13. The carotenoid overproducing E. coli of Claim 12 wherein the 
lower pathway optionally comprises genes selected from the group consisting 
of crtZ and crtW. 

14. A method for the production of a carotenoid comprising: 

a) contacting the carotenoid overproducing microorganism of any 
of Claims 1-3 with a fermentable carbon substrate; 

b) growing the carotenoid overproducing microorganism of step (a) 
for a time sufficient to produce a carotenoid; and 

c) optionally recovering the carotenoid form the carotenoid 
overproducing microorganism of step (b). 

1 5. A method for the production of a carotenoid comprising: 

a) contacting the carotenoid overproducing E. coli of Claim 12 
with a fermentable carbon substrate; 

b) growing the carotenoid overproducing E. coli of step (a) for a 
time sufficient to produce a carotenoid; and 

c) optionally recovering the carotenoid form the carotenoid 
overproducing microorganism of step (b). 

16. A method according to either Claim 14 or 15 wherein the 
carotenoid is selected from the group consisting of antheraxanthin, 
adonixanthin, astaxanthin, canthaxanthin, capsorubrin.p-cryptoxanthin, 
didehydrolycopene, didehydrolycopene, p-carotene, ^-carotene, 5-carotene, 
y-carotene, keto-y-carotene, vy-carotene, e-carotene, p,\j/-carotene, 
torulene, echinenone, gamma-carotene, zeta-carotene, alpha-cryptoxanthin, 
diatoxanthin, 7,8-didehydroastaxanthin, fucoxanthin, fucoxanthinol, 
isorenieratene, p-isorenieratene, lactucaxanthin, lutein, lycopene, neoxanthin, 
neurosporene, hydroxyneurosporene, peridinin, phytoene, rhodopin, rhodopin 
glucoside, siphonaxanthin, spheroidene, spheroidenone, spirilloxanthin, 
uriolide, uriolide acetate, violaxanthin, zeaxanthin-p-diglucoside, zeaxanthin, 
and C30-carotenoids. 
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Figure 4 
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msbA ycaH 



Tn5: 962815 

.Tn5 



rpIL rpoB 
rpoC: 4182928..4187151 



rpoC. htrC thiH thiG 

Tn5: 4187062 



Tn5 



Jda&_ 



^9R^438W-1^4389727- 



Tn5 
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. CL-2028PCT.ST25.txt 
SEQUENCE LISTING 

<110> E. I- duPont de Nemours and company, inc. 

<120> Mutations Affecting Carotenoid Production 

<130> CL-2028 PCT 

<150> US 60/435612 

<151> 2002-12-19 

<160> 43 

<170> Patentln version 3.2 

<210> 1 

<211> 912 

<212> DNA 

<213> Pantoea stewartii 



<220> 

<221> misc_feature 

<222> (I) . . (3) 

<223> Alternative start code usage of TTG instead of ATG. 



<400> 1 
ttgacggtct 


gcgcaaaaaa 


acacgttcac 


cttactggca tttcggctga 


gcagttgctg 


60 


gctgatatcg 


atagccgcct 


tgatcagtta 


ctgccggttc agggtgagcg 


ggattgtgtg 


120 


ggtgccgcga 


tgcgtgaagg 


cacgctggca 


ccgggcaaac gtattcgtcc 


gatgctgctg 


180 


ttattaacag 


cgcgcgatct 


tggctgtgcg. 


atcagtcacg ggggattact 


ggatttagcc 


240 


tgcgcggttg 


aaatggtgca 


tgctgcctcg 


ctgattctgg atgatatgcc 


ctgcatggac 


300 


gatgcgcaga 


tgcgtcgggg 


gcgtcccacc 


attcacacgc agtacggtga 


acatgtggcg 


360 


attctggcgg 


cggtcgcttt 


actcagcaaa 


gcgtttgggg tgattgccga 


ggctgaaggt 


420 


ctgacgccga 


tagccaaaac 


tcgcgcggtg 


tcggagctgt ccactgcgat 


tggcatgcag 


480 


ggtctggttc 


agggccagtt 


taaggacctc 


tcggaaggcg ataaaccccg 


cagcgccgat 


540 


gccatactgc 


taaccaatca 


gtttaaaacc 


agcacgctgt tttgcgcgtc 


aacgcaaatg 


600 


gcgtccattg 


cggccaacgc 


gtcctgcgaa 


gcgcgtgaga acctgcatcg 


tttctcgctc • 


660 


gatctcggcc 


aggcctttca 


gttgcttgac 


gatcttaccg atggcatgac 


cgataccggc 


720 


aaagacatca 


atcaggatgc 


aggtaaatca 


acgctggtca atttattagg 


ctcaggcgcg 


780 


gtcgaagaac 


gcctgcgaca 


gcatttgcgc 


ctggccagtg aacacctttc 


cgcggcatgc 


840 


caaaacggcc 


attccaccac 


ccaacttttt 


attcaggcct ggtttgacaa 


aaaactcgct 


900 


gccgtcagtt 


aa 








912 



<210> 2 .... 
<2il> 303 
<212> PRT 

<213> Pantoea stewartii 
<400> 2 

Met Thr Val Cys Ala Lys Lys His val His Leu Thr Gly He ser Ala 
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15 10 15 

Glu Gin Leu Leu Ala Asp lie Asp Ser Arg Leu Asp Gin Leu Leu Pro 
20 25 30 

val Gin Gly Glu Arg Asp Cys Val Gly Ala Ala Met Arg Glu Gly Thr 
35 40 45 

Leu Ala Pro Gly Lys Arg lie Arg Pro Met Leu Leu Leu Leu Thr Ala 
50 55 60 

Arg Asp Leu Gly cys Ala lie Ser His Gly Gly Leu Leu Asp Leu Ala 
65 70 75 80 

cys Ala val Glu Met val His Ala Ala ser Leu lie Leu Asp Asp Met 
85 90 95 

Pro Cys Met Asp Asp Ala Gin Met Arg Arg Gly Arg Pro Thr lie His 
100 105 110 

Thr Gin Tyr Gly Glu His val Ala lie Leu Ala Ala Val Ala Leu Leu 
115 120 125 

Ser Lys Ala Phe Gly Val lie Ala Glu Ala Glu Gly Leu Thr Pro lie 
130 135 140 

Ala Lys Thr Arg Ala val Ser Glu Leu ser Thr Ala lie Gly Met Gin 
145 150 155 160 

Gly Leu val Gin Gly Gin Phe Lys Asp Leu ser Glu Gly Asp Lys Pro 
165 170 175 

Arg Ser Ala Asp Ala lie Leu Leu Thr Asn Gin Phe Lys Thr ser Thr 
180 185 190 

Leu Phe Cys Ala ser Thr Gin Met Ala Ser lie Ala Ala Asn Ala ser 
195 200 205 

cys Glu Ala Arg Glu Asn Leu His Arg Phe Ser Leu Asp Leu Gly Gin 
210 215 220 

Ala Phe Gin Leu Leu Asp Asp Leu Thr Asp Gly Met Thr Asp Thr Gly 
225 230 235 240 

Lys Asp lie Asn Gin Asp Ala Gly Lys Ser Thr Leu Val Asn Leu Leu 
245 250 255 

Gly Ser Gly Ala Val Glu Glu Arg Leu Arg Gin His Leu Arg Leu Ala 
260 265 270 

Ser Glu His Leu ser Ala Ala Cys Gin Asn Gly His Ser Thr Thr Gin 
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275 280 285 

Leu Phe lie Gin Ala Trp Phe Asp Lys Lys Leu Ala Ala val Ser 
290 295 300 

<210> 3 

<211> 1296 

<212> DNA 

<213> Pantoea stewartii 

<220> 

<221> CDS 

<222> (I) . . (1296) 



<400> 3 

atg age cat ttt gcg gtq ate gca ccg ccc ttt ttc age cat gtt cgc 

Met ser His Phe Ala val He Ala Pro Pro Phe Phe Ser His Val Arg 
1 5 10 15 



ctg ccc gec get ttt cat gcg ttg cag ata gag ggc gtq ate gtt gat 
Leu Pro Ala Ala Phe His Ala Leu Gin lie Glu Gly Val He Val Asp 
100 105 110 



cac gat cgt gtq ate gcg cat cat gca tgc aga atg ggt tta gee ccg 
His Asp Arg val He Ala His His Ala Cys Arg Met Gly Leu Ala Pro 
180 185 190 



48 



get ctg caa aac ctt get cag gaa tta gtq gee cgc ggt cat cgt gtt 96 
Ala Leu Gin Asn Leu Ala Gin Glu Leu val Ala Arg Gly His Arg val 
20 25 30 

acg ttt ttt cag caa cat gac tgc aaa gcg ctg gta acg ggc age gat 144 
Thr Phe Phe Gin Gin His Asp Cys Lys Ala Leu Val Thr 
35 40 .45 



ate gga ttc cag ace gtc gga ctg caa acg cat cct ccc ggt tec tta 192 
He Gly Phe Gin Thr val Gly Leu Gin Thr His Pro Pro Gly ser Leu 
50 55 60 

teg cac ctg ctg cac ctg gee gcg cac cca etc gga ccc teg atg tta 240 
Ser His xeu Leu His Leu Ala Ala His Pro Leu Gly Pro Ser Met Leu 
65 70 75 80 

cga ctg ate aat gaa atg gca cgt ace age gat atg ctt tgc egg gaa 288 
Arg Leu lie Asn Glu Met Ala Arg Thr Ser Asp Met Leu Cys Arg Glu 
85 90 95 



336 



caa atg gag ccg gca ggt gca gta gtc gca gaa gcg tea ggt ctg ccg 384 

Gin Met Glu Pro Ala Gly Ala Val Val Ala Glu Ala Ser Gly Leu Pro 
115 120 125 

ttt gtt teg gtg gee tgc gcg ctg ccg etc aac cgc gaa ccg ggt ttg 432 

Phe Val Ser val Ala cys Ala Leu Pro Leu Asn Arg Glu Pro Gly Leu 
130 135 140 

cct ctg gcg gtg atg cct ttc gag tac ggc ace age gat gcg get egg 480 

Pro Leu Ala Val Met Pro Phe Glu Tyr Gly Thr Ser Asp Ala Ala Arg 

145 150 155 ^ 160 

gaa cgc tat ace ace age gaa aaa att tat gac tgg ctg atg cga cgt 528 

Glu Arg Tyr Thr Thr Ser Glu Lys lie Tyr Asp Trp Leu Met Arg Arg 
165 " 170 . 175 



576 



cgt gaa aaa ctg cat cat tgt ttt tct cca ctg gca caa ate age cag 624 
Arg Glu Lys Leu His His cys Phe ser Pro Leu Ala Gin lie Ser Gin 
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195 



200 



CL-2028PCT.ST25.txt 
i 205 



ttg ate ccc gaa ctg gat ttt ccc cgc aaa gcg ctg cca gac tgc ttt 
Leu lie Pro Glu Leu Asp Phe Pro Arg Lys Ala Leu pro Asp Cys Phe 
210 215 220 

cat gcg gtt gga ccg tta egg caa ccc cag gag acg ccg ggg tea tea 
His Ala Val Gly Pro Leu Arg Gin Pro Gin Gly Thr Pro Gly Ser Ser 
225 230 235 240 

act tct tat ttt ccg tec ccg gac aaa ccc cgt att ttt gee teg ctg 
Thr ser Tyr Phe Pro Ser Pro Asp Lys Pro Arg lie Phe Ala Ser Leu 
245 250 255 

ggc acc ctg cag gga cat cgt tat ggc ctg ttc agg acc ate gee aaa 
Gly Thr Leu Gin Gly His Arg Tyr Gly Leu Phe Arg Thr lie Ala Lys 
260 265 270 

gee tgc gaa gag gtg gat gcg cag tta ctg ttg gca cac tgt ggc ggc 
Ala cys Glu Glu Val Asp Ala Gin Leu Leu Leu Ala His cys Gly Gly 
275 280 285 

etc tea gee acg cag gca ggt gaa ctg gee egg ggc ggg gac att cag 
Leu Ser Ala Thr Gin Ala Gly Glu Leu Ala Arg Gly Gly Asp lie Gin 
290 295 300 

gtt gtg gat ttt gee gat caa tec gca gca ctt tea cag gca cag ttg 
val val Asp Phe Ala Asp Gin ser Ala Ala Leu ser Gin Ala Gin Leu 
305 310 315 320 

aca ate aca cat ggt gqg atg aat acg gta ctg gac get att get tec 
Thr lie Thr His Gly Gly Met Asn Thr Val Leu Asp Ala lie Ala ser 
325 330 335 

cgc aca ccg eta ctg gcg ctg ccg ctg gca ttt gat caa cct ggc gtg 
Arg Thr Pro Leu Leu Ala Leu Pro Leu Ala Phe Asp Gin Pro Gly Val 
340 345 350 

gca tea cga att gtt tat cat ggc ate ggc aag cgt gcg tct egg ttt 
Ala Ser Arg lie Val Tyr His Gly lie Gly Lys Arg Ala Ser Arg Phe 
355 360 365 

act acc age cat gcg ctg gcg egg cag att cga teg ctg ctg act aac 
Thr Thr ser His Ala Leu Ala Arg Gin lie Arg Ser Leu Leu Thr Asn 
370 375 380 

acc gat tac ccg cag cgt atg aca aaa att cag gee gca ttg cgt ctg 
Thr Asp Tyr Pro Gin Arg Met Thr Lys lie Gin Ala Ala Leu Arg Leu 
385 390 395 400 

gca ggc ggc aca cca gee gee gee gat att gtt gaa cag gcg atg egg 
Ala Gly Gly Thr Pro Ala Ala Ala Asp lie Val Glu Gin Ala Met Arg 
405 410 415 

acc tgt cag cca gta etc agt ggg cag gat tat gca acc gca eta tga 
Thr Cys Gin Pro val Leu ser Gly Gin Asp Tyr Ala Thr Ala Leu 
420 425 430 



<210> 4 
<211> 431 
<212> PRT 

<213> Pantoea stewartii 
<400> 4 

Met Ser His Phe Ala val lie Ala Pro Pro Phe Phe Ser His Val Arg 



672 
720 
768 
816 
864 
912 
960 
1008 
1056 
1104 
1152 
1200 
1248 
1296 



1 



5 



10 



15 
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Ala Leu Gin Asn Leu Ala Gin Glu Leu val Ala Arg Gly His Arg val 
20 25 30 

Thr Phe Phe Gin Gin His Asp Cys Lys Ala Leu val Thr Gly ser Asp 
35 40 45 

lie Gly Phe Gin Thr val Gly Leu Gin Thr His Pro Pro Gly ser Leu 
50 55 60 

Ser His Leu Leu His Leu Ala Ala His Pro Leu Gly Pro Ser Met Leu 
65 70 75 80 

Arg Leu lie Asn Glu Met Ala Arg Thr Ser Asp Met Leu Cys Arg Glu 
85 90 95 

Leu Pro Ala Ala Phe His Ala Leu Gin He Glu Gly Val lie Val Asp 
100 105 110 

Gin Met Glu Pro Ala Gly Ala val val Ala Glu Ala Ser Gly Leu Pro 
115 120 125 

Phe val ser val Ala Cys Ala Leu Pro Leu Asn Arg Glu Pro Gly Leu 
130 135 140 

Pro Leu Ala Val Met Pro Phe Glu Tyr Gly Thr Ser Asp Ala Ala An 
145 150 155 16i 

Glu Arg Tyr Thr Thr Ser Glu Lys lie Tyr Asp Trp Leu Met Arg Arg 
165 170 175 

His Asp Arg Val lie Ala His His Ala cys Arg Met Gly Leu Ala Pro 
180 185 190 

Arg Glu Lys Leu His His Cys Phe Ser Pro Leu Ala Gin lie ser Gin 
195 200 205 

Leu lie Pro Glu Leu Asp Phe Pro Arg Lys Ala Leu Pro Asp Cys Phe 
210 215 220 

His Ala val Gly Pro Leu Arg Gin Pro Gin Gly Thr Pro Gly Ser Ser 
225 230 235 240 

Thr ser Tyr Phe Pro ser Pro Asp Lys Pro Arg lie Phe Ala Ser Leu 
245 250 255 

Gly Thr Leu Gin Gly His Arg Tyr Gly Leu Phe Arg Thr lie Ala Lys 
260 265 270 

Ala cys Glu Glu val Asp Ala Gin Leu Leu Leu Ala His cys Gly Gly 
275 280 285 
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Leu Ser Ala Thr Gin Ala Gly Glu Leu Ala Arg Gly Gly Asp lie Gin 
290 295 300 

val Val Asp Phe Ala Asp Gin ser Ala Ala Leu Ser Gin Ala Gin Leu 
305 310 315 320 

Thr He Thr His Gly Gly Met Asn Thr Val Leu Asp Ala lie Ala Ser 
325 330 335 

Arg Thr Pro Leu Leu Ala Leu pro Leu Ala Phe Asp Gin Pro Gly Val 
340 345 350 

Ala ser Arg lie val Tyr His Gly lie Gly Lys Arg Ala ser Arg Phe 
355 360 365 

Thr Thr Ser His Ala Leu Ala Arg Gin lie Arg ser Leu Leu Thr Asn 
370 375 380 

Thr Asp Tyr Pro Gin Arg Met Thr Lys lie Gin Ala Ala Leu Arg Leu 
385 390 395 400 

» 

Ala Gly Gly Thr Pro Ala Ala Ala Asp lie Val Glu Gin Ala Met Arg 
405 410 415 

Thr Cys Gin Pro Val Leu Ser Gly Gin Asp Tyr Ala Thr Ala Leu 
420 425 430 

<210> 5 
<211> 1149 
<212> DNA 

<213> Pantoea stewartii 
<220> 

<221> CDS 

<222> (1)..(1149) 

<400> 5 

atg caa ccg cac tat gat etc att ctg gtc gqt gec gqt ctg get aat 48 
Met Gin Pro His Tyr Asp Leu lie Leu val Gly Ala Gly Leu Ala Asn 
15 10 15 

ggc ctt ate gcg etc egg ctt cag caa cag cat ccg gat atg egg ate 96 
Gly Leu lie Ala Leu Arg Leu Gin Gin Gin His Pro Asp Met Arg lie 
20 25 30 

ttg ctt att gag gcg gqt cct gag gcg gqa gqg aac cat ace tgg tec 144 
Leu Leu lie Glu Ala Gly Pro Glu Ala Gly Gly Asn His Thr Trp Ser 

ttt "cac gaa gag gat tta acg ctg aat cag cat cgc tgg ata gcg ccg 192 
Phe His Glu Glu Asp Leu Thr Leu Asn Gin His Arg Trp lie Ala Pro 
50 55 60 

ctt gtg gtc cat cac tgg ccc gac tac cag gtt cgt ttc ccc caa cgc 240 

Leu val val His His Trp Pro Asp Tyr Gin val Arg Phe Pro Gin Arg 
65 70 75 80 
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cgt cgc cat gtg aac agt ggc tac tac tgc gtg acc tec egg cat ttc 288 
' A Asn ser Gly Tyr Tyr cys val Thr " 



Arg Arg His Val Asn ser Gly Tyr Tyr Cys val Thr Ser Arg His Phe 
85 90 95 

gec ggg ata etc egg caa cag ttt gga caa cat tta tgg ctg cat acc 

Ala Gly lie Leu Arg Gin Gin Phe Gly Gin His Leu Trp Leu His Thr 
100 105 110 

gcg gtt tea gee gtt cat get gaa teg gtc cag tta gcg gat ggc egg 

Ala Val ser Ala val His Ala Glu Ser Val Gin Leu Ala Asp Gly Arg 
115 120 125 



336 



384 



att att cat gee agt aca gtg ate gac gga egg ggt tac acg cct gat 432 
lie lie His Ala ser Thr Val lie Asp Gly Arg Gly Tyr Thr Pro Asp 
130 135 140 

tct gca eta cgc gta gga ttc cag gca ttt ate ggt cag gag tgg caa 480 
ser Ala Leu Arg Val Gly Phe Gin Ala Phe lie Gly Gin Glu Trp Gin 
145 150 155 160 

ctg age gcg ccg cat ggt tta teg tea ccg att ate atg gat gcg acg 528 
Leu ser Ala Pro His Gly Leu Ser Ser Pro lie He Met Asp Ala Thr 
165 170 175 

gtc gat cag caa aat ggc tac cgc ttt gtt tat acc ctg ccg ctt tec 576 
val Asp Gin Gin Asn Gly Tyr Arg Phe val Tyr Thr Leu Pro Leu ser 
180 185 190 

gca acc gca ctg ctg ate gaa gac aca cac tac att gac aag get aat 624 
Ala Thr Ala Leu Leu lie Glu Asp Thr His Tyr lie Asp Lys Ala Asn 
195 200 205 

ctt cag gee gaa egg gcg cgt cag aac att cgc gat tat get gcg cga 672 
Leu Gin Ala Glu Arg Ala Arg Gin Asn lie Arg Asp Tyr Ala Ala Arg 
210 215 220 

cag ggt tgg ccg tta cag acg ttg ctg egg gaa gaa cag ggt gca ttg 720 
Gin Gly Trp pro Leu Gin Thr Leu Leu Arg Glu Glu Gin Gly Ala Leu 
225 230 235 240 

ccc att acg tta acg ggc gat aat cgt cag ttt tgg caa cag caa ccg 768 
Pro lie Thr Leu Thr Gly Asp Asn Arg Gin Phe Trp Gin Gin Gin Pro 
245 250 255 

caa gee tgt age gga tta cgc gee ggg ctg ttt cat ccg aca acc ggc 816 
Gin Ala cys ser Gly Leu Arg Ala Gly Leu Phe His Pro Thr Thr Gly 
260 265 270 

tac tec eta ccg etc gcg gtg gcg ctg gee gat cgt etc age gcg ctg 864 
Tyr ser Leu Pro Leu Ala val Ala Leu Ala Asp Arg Leu Ser Ala Leu 
275 280 285 

gat gtg ttt acc tct tec tct gtt cac cag acg att get cac ttt gee 912 
Asp Val Phe Thr ser Ser Ser val His Gin Thr lie Ala His Phe Ala 
290 295 300 

cag caa cgt tgg cag caa cag ggg ttt ttc cgc atg ctg aat cgc atg 960 
Gin Gin Arg Trp Gin Gin Gin Gly Phe Phe Arg Met Leu Asn Arg Met 
305 310 315 320 

ttg ttt tta gee gga ccg gee gag tea cgc tgg cgt gtg atg cag cgt 1008 
Leu Phe Leu Ala Gly Pro Ala Glu Ser Arg Trp Arg Val Met Gin Arg 
325 338 335 

ttc tat ggc tta ccc gag gat ttg att gee cgc ttt tat gcg gga aaa 1056 

Phe Tyr Gly Leu Pro Glu Asp Leu lie Ala Arg Phe Tyr Ala Gly Lys 
340 345 350 
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etc acc gtg acc gat egg eta cgc att ctg age gqc aag ccg ccc gtt 

Leu Thr VaT Thr Asp Arg Leu Arg He Leu Ser Gly Lys Pro Pro val 
355 " 360 365 

ccc gtt ttc gcg gca ttg cag gca att atg acg act cat cgt tga 

Pro Val Phe Ala Ala Leu Gin Ala lie Met Thr Thr His Arg 

370 375 380 



<210> 6 

<211> 382 

<212> PRT 

<213> Pantoea steward i 

<400> 6 

Met Gin Pro His Tyr Asp Leu lie Leu val Gly Ala Gly Leu Ala Asn 
1 5 10 15 



Gly Leu lie Ala Leu Arg Leu Gin Gin Gin His Pro Asp Met Arg lie 
20 25 30 



Leu Leu lie Glu Ala Gly Pro Glu Ala Gly Gly Asn His Thr Trp ser 
35 40 45 



Phe His Glu Glu Asp Leu Thr Leu Asn Gin His Arg Trp lie Ala Pro 
50 55 60 



Leu Val val His His Trp Pro Asp Tyr Gin val Arg Phe Pro Gin Arg 
65 70 75 80 



Arg Arg His Val Asn ser Gly Tyr Tyr Cys Val Thr Ser Arg His Phe 
85 90 95 



Ala Gly lie Leu Arg Gin Gin Phe Gly Gin His Leu Trp Leu His Thr 
100 105 110 



Ala Val ser Ala val His Ala Glu Ser val Gin Leu Ala Asp Gly Arg 
115 120 125 



He lie .His Ala Ser Thr val lie Asp Gly Arg Gly Tyr Thr Pro 
130 135 140 



Ser Ala Leu Arg val Gly Phe Gin Ala Phe lie Gly Gin Glu Trp Gin 
145 150 155 160 



Leu ser Ala Pro His Gly Leu Ser Ser Pro lie lie Met Asp Ala Thr 



Va1 Asp Gin Gin Asn Gly Tyr Arg Phe Val Tyr Thr Leu Pro Leu ser 
180 185 190 

Ala Thr Ala Leu Leu lie Glu Asp Thr His Tyr lie Asp Lys Ala Asn 



1104 
1149 



165 



170 



175 



195 
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Leu Gin Ala Glu Arg Ala Arg Gin Asn lie Arg Asp Tyr Ala Ala Arg 
210 ~ 215 220 

Gin Gly Trp Pro Leu Gin Thr Leu Leu Arg Glu Glu Gin Gly Ala Leu 
225 230 235 240 

Pro lie Thr Leu Thr Gly Asp Asn Arg Gin Phe Trp Gin Gin Gin Pro 
245 250 255 

Gin Ala cys ser Gly Leu Arg Ala Gly Leu Phe His Pro Thr Thr Gly 
260 265 270 

Tyr Ser Leu Pro Leu Ala Val Ala Leu Ala Asp Arg Leu Ser Ala Leu 
275 280 285 

Asp val Phe Thr Ser ser ser val His Gin Thr lie Ala His Phe Ala 
290 295 300 

Gin Gin Arg Trp Gin Gin Gin Gly Phe Phe Arg Met Leu Asn Arg Met 
305 310 315 320 

Leu Phe Leu Ala Gly Pro Ala Glu ser Arg Trp Arg Val Met Gin Arg 
325 330 335 

Phe Tyr Gly Leu Pro Glu Asp Leu lie Ala Arg Phe Tyr Ala Gly Lys 
340 345 350 

Leu Thr val Thr Asp Arg Leu Arg lie Leu ser Gly Lys Pro Pro val 
355 360 365 

Pro val Phe Ala Ala Leu Gin Ala lie Met Thr Thr His Arg 
370 375 380 

<210> 7 

<211> 1479 

<212> DNA 

<213> Pantoea stewartii 
<220> 

<221> CDS 

<222> (1) . . (1479) 

<400> 7 

atg aaa cca act acg gta att ggt gcg ggc ttt ggt ggc ctg gca ctg 

Met Lys Pro Thr Thr Val lie Gly Ala Gly Phe Gly Gly Leu Ala Leu 
1 5 10 15 



48 



gca att cgt tta cag gcc gca ggt att cct gtt ttg ctg ctt gag cag 96 
Ala ile Arg Leii Gin Ala Ala Gly lie Pro val Leu Leu Leu Glu Gin 
20 25 30 

cgc gac aag ccg ggt ggc egg get tat gtt tat cag gag cag ggc ttt 144 
Arg Asp Lys Pro Gly Gly Arg Ala Tyr Val Tyr Gin Glu Gin GJy Phe 
35 40 45 
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act ttt gat gca ggc cct acc gtt ate acc gat ccc age gcg att gaa 
Thr Phe Asp Ala Gly Pro Thr Val lie Thr Asp Pro Ser Ala lie Glu 



50 55 60 



gaa ctg ttt get ctg gee ggt aaa cag ctt aag gat tac gtc gag ctg 
Glu Leu Phe Ala Leu Ala Gly Lys Gin Leu Lys Asp Tyr Val Glu Leu 
65 70 75 80 



cgt gee gta ttc aat gag ggc tat ctg aag etc ggc act gtg cct ttt 
Arg Ala val Phe Asn Glu Gly Tyr Leu Lys Leu Gly Thr Val Pro Phe 
130 135 140 



gag cat ctt egg cag gcg ttt tct ttt cac teg etc tta gtg ggg ggg 
Glu His Leu Arg Gin Ala Phe Ser Phe His Ser Leu Leu Val Gly Gly 
180 185 190 



240 



ttg ccg gtc acg ccg ttt tat cgc ctg tgc tgg gag tec ggc aag gtc 288 

Leu Pro val Thr Pro Phe Tyr Arg Leu Cys Trp Glu ser Gly Lys Val 
85 90 95 

ttc aat tac gat aac gac cag gee cag tta gaa gcg cag ata cag cag 336 

Phe Asn Tyr Asp Asn Asp Gin Ala Gin Leu Glu Ala Gin He Gin Gin 
100 105 110 

ttt aat ccg cgc gat gtt gcg ggt tat cga gcg ttc ctt gac tat teg 384 

Phe Asn Pro Arg Asp Val Ala Gly Tyr Arg Ala Phe Leu Asp Tyr Ser 

115 120 125 



432 



tta teg ttc aaa gac atg ctt egg gee gcg ccc cag ttg gca aag ctg 480 

Leu Ser Phe Lys Asp Met Leu Arg Ala Ala Pro Gin Leu Ala Lys Leu 

145 150 155 160 

cag gca tgg cgc age gtt tac agt aaa gtt gee ggc tac att gag gat 528 

Gin Ala Trp Arg Ser Val Tyr Ser Lys Val Ala Gly Tyr lie Glu Asp 
165 170 175 



576 



aat ccg ttt gca acc teg tec att tat acg ctg att cac gcg tta gaa 624 

Asn Pro Phe Ala Thr ser Ser lie Tyr Thr Leu lie His Ala Leu Glu 
195 200 205 

egg gaa tgg ggc gtc tgg ttt cca cgc ggt gga acc ggt gcg ctg gtc 672 

Arg Glu Trp Gly Val Trp Phe Pro Arg Gly Gly Thr Gly Ala Leu Val 
210 215 - 220 

aat ggc atg ate aag ctg ttt cag gat ctg ggc ggc gaa gtc gtg ctt 

Asn Gly Met lie Lys Leu Phe Gin Asp Leu Gly Gly Glu Val val Leu 

225 230 235 240 



720 



aac gec egg gtc agt cat atg gaa acc gtt ggg gac aag att cag gec 768 

Asn Ala Arg val ser His Met Glu Thr Val Gly Asp Lys lie Gin Ala 

245 250 255 

gtg cag ttg gaa gac ggc aga egg ttt gaa acc tgc gcg gtg gcg teg 816 

val Gin Leu Glu Asp Gly Arg Arg Phe Glu Thr cys Ala val Ala ser 
260 265 270 

aac get gat gtt gta cat acc tat cgc gat ctg ctg tct cag cat ccc 864 

Asn Ala Asp val Val His Thr Tyr Arg Asp Leu Leu ser Gin His Pro 

275 280 285 

gca gec get aag cag gcg aaa aaa ctg caa tec aag cgt atg agt aac 912 

Ala ATa Ala Lys Gin Ala Lys Lys Leu Gin Ser Lys Arg Met ser Asn 
290 295 300 

tea ctg ttt gta etc tat ttt ggt etc aac cat cat cac gat caa etc 960 

Ser Leu Phe Val Leu Tyr Phe Gly Leu Asn His His His Asp Gin Leu 

305 310 315 320 
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gcc cat cat acc gtc tgt ttt ggg cca cgc tac cgt gaa ctg att cac 
Ala His His Thr val Cys Phe Gly Pro Arg Tyr Arg Glu Leu He His 
325 330 335 



1008 



gaa att ttt aac cat gat ggt ctg get gag gat ttt teg ctt tat tta 1056 

Glu lie Phe Asn His Asp Gly Leu Ala Glu Asp Phe Ser Leu Tyr Leu 

340 345 350 

cac gca cct tgt gtc acg gat ccg tea ctg gca ccg gaa ggg tgc ggc 1104 

His Ala Pro Cys Val Thr Asp Pro Ser Leu Ala Pro Glu Gly cys Gly 

355 360 365 



age tat tat gtg ctg gcg cct gtt cca cac tta ggc acg gcg aac etc 1152 
Ser Tyr Tyr Val Leu Ala Pro Val Pro His Leu Gly Thr Ala Asn Leu 
370 375 380 



gac tgg gcg gta gaa gga ccc cga ctg cgc gat cgt att ttt gac tac 1200 
Asp Trp Ala Val Glu Gly Pro Arg Leu Arg Asp Arg lie Phe Asp Tyr 
385 390 395 400 

ctt gag caa cat tac atg cct ggc ttg cga age cag ttg gtg acg cac 1248 
Leu Glu Gin His Tyr Met Pro Gly Leu Arg ser Gin Leu Val Thr His 
405 410 415 

cgt atg ttt acg ccg ttc gat ttc cgc gac gag etc aat gcc tgg caa 1296 
Arg Met Phe Thr Pro Phe Asp Phe Arg Asp Glu Leu Asn Ala Trp Gin 
420 425 430 

ggt teg gcc ttc teg gtt gaa cct att ctg acc cag age gcc tgg ttc 1344 
Gly ser Ala Phe Ser val Glu Pro lie Leu Thr Gin ser Ala Trp Phe 
435 440 445 

cga cca cat aac cgc gat aag cac att gat aat ctt tat ctg gtt ggc 1392 
Arg Pro His Asn Arg Asp Lys His lie Asp Asn Leu Tyr Leu Val Gly 
450 455 460 v 

gca ggc acc cat cct ggc gcg ggc att ccc ggc gta ate ggc teg gcg 
Ala Gly thr His Pro Gly Ala Gly lie Pro Gly val He Gly ser Ala 
465 470 475 480 



1440 



aag gcg acg gca ggc tta atg ctg gag gac ctg att tga 1479 
Lys Ala Thr Ala Gly Leu Met Leu Glu Asp Leu lie 
485 490 

<210> 8 
<211> 492 
<212> PRT 

<213> Pantoea steward i 
<400> 8 

Met Lys Pro Thr Thr val lie Gly Ala Gly Phe Gly Gly Leu Ala Leu 
1 5 "10 * 15 

Ala lie Arg Leu Gin Ala Ala Gly lie Pro Val Leu Leu Leu Glu Gin 
20 25 30 

Arg Asp Lys Pro Gly Gly Arg Ala Tyr val Tyr Gin Glu Gin Gly Phe 
35 40 45 

Thr Phe Asp Ala Gly Pro Thr val lie Thr Asp Pro ser Ala lie Glu 
50 55 60 
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Glu Leu Phe Ala Leu Ala Gly Lys Gin Leu Lys Asp Tyr Val Glu Leu 
65 70 75 80 

Leu Pro val Thr Pro Phe Tyr Arg Leu Cys Trp Glu Ser Gly Lys Val 
85 *" 90 95 

Phe Asn Tyr Asp Asn Asp Gin Ala Gin Leu Glu Ala Gin lie Gin Gin 
100 105 110 

Phe Asn Pro Arg Asp val Ala Gly Tyr Arg Ala Phe Leu Asp Tyr Ser 
115 120 125 

Arg Ala val Phe Asn Glu Gly Tyr Leu Lys Leu Gly Thr val Pro Phe 
130 135 140 

Leu ser Phe Lys Asp Met Leu Arg Ala Ala Pro Gin Leu Ala Lys Leu 
145 150 155 160 

Gin Ala Trp Arg Ser val Tyr Ser Lys Val Ala Gly Tyr lie Glu Asp 
165 170 175 

Glu His Leu Arg Gin Ala Phe ser Phe His Ser Leu Leu val Gly Gly 
180 185 190 

Asn Pro Phe Ala Thr Ser Ser lie Tyr Thr Leu lie His Ala Leu Glu 
195 200 205 

Arg Glu Trp Gly Val Trp Phe Pro Arg Gly Gly Thr Gly Ala Leu Val 
210 215 " 220 

Asn Gly Met lie Lys Leu Phe Gin Asp Leu Gly Gly Glu Val val Leu 
225 230 235 240 

Asn Ala Arg val ser His Met Glu Thr val Gly Asp Lys He Gin Ala 
245 250 255 

val Gin Leu Glu Asp Gly Arg Arg Phe Glu Thr cys Ala val Ala ser 
260 265 270 

Asn Ala Asp Val Val His Thr Tyr Arg Asp Leu Leu ser Gin His Pro 
275 280 285 

Ala Ala Ala Lys Gin Ala Lys Lys Leu Gin ser Lys Arg Met ser Asn 
290 295 300 

Ser Leu Phe val Leu Tyr Phe Gly Leu Asn His His His Asp G-ln -Leu 
305 310 315 ' 320 

Ala His His Thr val Cys Phe Gly Pro Arg Tyr Arg Glu Leu lie His 
325 330 335 
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Glu lie Phe Asn His Asp Gly Leu Ala Glu Asp Phe Ser Leu Tyr Leu 
340 345 350 

His Ala Pro cys Val Thr Asp Pro ser Leu Ala Pro Glu Gly cys Gly 
355 360 365 

Ser Tyr Tyr val Leu Ala Pro val Pro His Leu Gly Thr Ala Asn Leu 
370 375 380 

Asp Trp Ala val Glu Gly Pro Arg Leu. Arg Asp Arg lie Phe Asp Tyr 
385 390 395 400 

Leu Glu Gin His Tyr Met Pro Gly Leu Arg Ser Gin Leu Val Thr His 
405 410 415 

Arg Met Phe Thr Pro Phe Asp Phe Arg Asp Glu Leu Asn Ala Trp Gin 
420 425 430 

Gly Ser Ala Phe Ser val Glu Pro lie Leu Thr Gin Ser Ala Trp Phe 
435 440 445 

Arg Pro His Asn Arg Asp Lys His lie Asp Asn Leu Tyr Leu val Gly 
450 455 460 

Ala Gly Thr His Pro Gly Ala Gly lie Pro Gly Val lie Gly Ser Ala 
465 470 475 480 

Lys Ala Thr Ala Gly Leu Met Leu Glu Asp Leu lie 
485 490 

<210> 9 
<211> 891 
<212> DNA 

<213> Pantoea stewartii 
<220> 

<221> CDS 

<222> (1) . . (891) 

<400> 9 

atg gcg gtt ggc teg aaa age ttt gcg act gca teg acg ctt ttc gac 48 

Met Ala Val Gly ser Lys ser Phe Ala Thr Ala ser Thr Leu Phe Asp 
1 5 10 15 

gcc aaa ace cgt cgc age gtg ctg atg ctt tac gca tgg tgc cgc cac 96 
Ala Lys Thr Arg Arg Ser Val Leu Met Leu Tyr Ala Trp Cys Arg His 
20 25 30 

tgc gac gac gtc att gac gat caa aca ctg ggc ttt cat gcc gac cag 144 
Cys Asp Asp val lie Asp Asp Gin Thr Leu Gly Phe His Ala Asp Gin 
35 40 45 

ccc tct teg cag atg cct gag cag cgc ctg cag cag ctt gaa atg aaa 192 
Pro Ser ser Gin Met pro Glu Gin Arg Leu Gin Gin Leu Glu Met Lys 
50 55 60 

acg cgt cag gcc tac gcc ggt teg caa atg cac gag ccc get ttt gcc 240 

Page 13 



WO 2004/056974 PCT/US2003/04181 1 

CL-2028PCT.ST25.txt 
Thr Arg Gin Ala Tyr Ala Gly ser Gin wet His Glu Pro Ala Phe Ala 
65 . 70 75 80 

gcg ttt cag gag gtc gcg atg gcg cat gat ate get ccc gec tac gcg 288 
Ala Phe Gin Glu val Ala Met Ala His Asp He Ala Pro Ala Tyr Ala 
85 90 95 

ttc gac cat ctg gaa ggt ttt gec atg gat gtg cgc gaa acg cgc tac 336 
Phe Asp His Leu Glu Gly Phe Ala Met Asp Val Arg Glu Thr Arg Tyr 
100 105 110 

ctg aca ctg gac gat acg ctg cgt tat tgc tat cac gtc gec ggt gtt 384 
Leu Thr Leu Asp Asp Thr Leu Arg Tyr Cys Tyr His val Ala Gly Val 
115 120 125 

gtg ggc ctg atg atg gcg caa att atg ggc gtt cgc gat aac gec acg 432 
val Gly Leu Met Met Ala Gin lie Met Gly val Arg Asp Asn Ala Thr 
130 135 140 

etc gat cgc gec tgc gat etc ggg ctg get ttc cag ttg ace aac att 480 
Leu Asp Arg Ala Cys Asp Leu Gly Leu Ala Phe Gin Leu Thr Asn lie 
145 150 155 160 



gcg cgt gat att gtc gac gat get cag gtg ggc cgc tgt tat ctg cct 
Ala Arg Asp lie Val Asp Asp Ala Gin Val Gly Arg Cys Tyr Leu Pro 
165 170 175 



aaa att gqc gtg aaa gtt gaa cag gee ggt aag cag gee tgg gat cat 
Lys He Gly val Lys val Glu Gin Ala Gly Lys Gin Ala Trp Asp His 
245 250 255 



528 



gaa age tgg ctg gaa gag gaa gga ctg acg aaa gcg aat tat get gcg 576 
Glu ser Trp Leu Glu Glu Glu Gly Leu Thr Lys Ala Asn Tyr Ala Ala 
180 185 190 

cca gaa aac egg cag gee tta age cgt ate gec ggg cga ctg gta egg 624 
Pro Glu Asn Arg Gin Ala Leu ser Arg lie Ala Gly Arg Leu val Arg 
195 200 205 

gaa gcg gaa ccc tat tac gta tea tea atg gee ggt ctg gca caa tta 672 
Glu Ala Glu Pro Tyr Tyr val Ser Ser Met Ala Gly Leu Ala Gin Leu 
210 215 220 

ccc tta cgc teg gec tgg gec ate gcg aca gcg aag cag gtg tac cgt 720 
Pro Leu Arg ser Ala Trp Ala lie Ala Thr Ala Lys Gin Val Tyr Arg 
225 230 235 240 



768 



cgc cag tec acg tec ace gee gaa aaa tta acg ctt ttg ctg acg gca 816 
Arg Gin ser Thr Ser Thr Ala Glu Lys Leu Thr Leu Leu Leu Thr Ala 
260 265 270 

tec ggt cag gca gtt act tec egg atg aag acg tat cca ccc cgt cct 864 
Ser Giy Gin Ala Val Thr Ser Arg Met Lys Thr Tyr Pro Pro Arg Pro 
275 280 285 

V 

get cat etc tgg cag cgc ccg ate tag 891 
Ala His Leu Trp Gin Arg pro lie 
290 295 

<210> 10 
<213> 296 
<212> PRT 

<213> Pantoea stewartii 
<400> 10 

Met Ala val Gly ser Lys Ser Phe Ala Thr Ala Ser Thr Leu Phe Asp 
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Ala Lys Thr Arg Arg ser val Leu Met Leu Tyr Ala Trp Cys Arg His 
20 25 30 

Cys Asp Asp val lie Asp Asp Gin Thr Leu Gly Phe His Ala Asp Gin 
35 40 45 

Pro Ser Ser Gin Met Pro Glu Gin Arg Leu Gin Gin Leu Glu Met Lys 
50 55 60 

Thr Arg Gin Ala Tyr Ala Gly Ser Gin Met His Glu Pro Ala Phe Ala 
65 70 75 80 

Ala Phe Gin Glu Val Ala Met Ala His Asp lie Ala Pro Ala Tyr Ala 
85 90 95 

Phe Asp His Leu Glu Gly Phe Ala Met Asp Val Arg Glu Thr Arg Tyr 
100 105 " 110 

Leu Thr Leu Asp Asp Thr Leu Arg Tyr Cys Tyr His Val Ala Gly Val 
115 120 125 

Val Gly Leu Met Met Ala Gin He Met Gly Val Arg Asp Asn Ala Thr 
130 135 140 

Leu Asp Arg Ala Cys Asp Leu Gly Leu Ala Phe Gin Leu Thr Asn lie 
145 150 155 160 

Ala Arg Asp lie Val Asp Asp Ala Gin Val Gly Arg Cys Tyr Leu Pro 
165 170 175 

Glu ser Trp Leu Glu Glu Glu Gly Leu Thr Lys Ala Asn Tyr Ala Ala 
180 185 190 

Pro Glu Asn Arg Gin Ala Leu Ser Arg lie Ala Gly Arg Leu Val Arg 
195 200 205 

Glu Ala Glu Pro Tyr Tyr Val ser Ser Met Ala Gly Leu Ala Gin Leu 
210 215 220 

Pro Leu Arg Ser Ala Trp Ala lie Ala Thr Ala Lys Gin Val Tyr Arg 
225 230 235 240 

Lys lie Gly val Lys val Glu Gin Ala Gly Lys Gin Ala Trp Asp His 

245 250 255 

Arg Gin Ser Thr Ser Thr Ala Glu Lys Leu Thr Leu Leu Leu Thr Ala 
260 265 270 

Ser Gly Gin Ala Val Thr Ser Arg Met Lys Thr Tyr pro Pro Arg Pro 
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Ala His Leu Trp Gin Arg Pro lie 
290 295 

<210> 11 

<211> 528 

<212> DNA 

<213> Pantoea stewartii 



<220> 

<221> CDS 

<222> (1) . . (528) 

<400> 11 



48 



atg ttg tgg att tgg aat gcc ctg ate gtg ttt gtc acc gtg ate aac 
Met Leu Trp lie Trp Asn Ala Leu lie VaT Phe val Thr VaT val eft 
1 5 io 15 

?? a g , tg gct gca ctg gca cat aaa tac ate atg cac ggc tgg 96 
Met Glu val val Ala Ala Leu Ala His Lys Tyr. lie Met His Gly Trp- 
20 25 30 

ggt tgg ggc tgg cat ctt tea cat cat gaa ccg cgt aaa ggc gca ttt 144 
Gly Trp GTy Trp His Leu ser His His Glu Pro Arg Lys GTy Ala Phe 

2 aa S*? aac gat ctc tat gcc gt 9 gta ttc gee att gtg teg att gcc 192 
Glu val Asn Asp Leu Tyr Ala VaT val phe Ala lie vaT ser He Ala 
50 55 60 

ctg att tac ttc ggc agt aca gga ate tgg ccg ctc cag tgg att qqt 240 
Leu He Tyr Phe cTy ser Thr G Ty lie Trp Pro Leu Gin Trp lie gTv 
65 70 75 80 

?? a ?? c » tg acc gct tat ggt tta ctg tat ttt atg gtc cac gac gqa 288 
Ala GTy Met Thr Ala Tyr G?y Leu Leu Tyr Phe Met Val His Asp gTv 
85 90 95^ 

?!?. x 9 /!? S ac 5 ag !r gc tgg ccg ttc cgc tac ata ccg cgc aaa ggc tac 336 
Leu val His Gin Arg Trp Pro Phe Arg Tyr He Pro Arg Lys Gly Tyr 
100 105 110 

f aa ?" ? ta tac atg 9 ? c cac cat atg cat cat get gta agg gga 384 
Leu Lys Arg Leu Tyr Met Ala His Arg Met His His Ala val Arg GTy 
115 120 125 

ff:t 2? 9 §? c 5 gc gt 9 tec ttt ggt ttt ctg tac gcg cca ccg tta tct 432 
Lys Glu GTy cys vaT ser phe Gly Phe Leu Tyr Ala Pro pro Leu ser 
130 135 140 

aaa ctt cag gcg acg ctg aga gaa agg cat gcg get aga teg ggc qct 480 
Lys Leu Gin Ala Thr Leu Arg Glu Arg His Ala Ala Arg SeF GTy Ala 

fif a 2 a ? at ?? g 5? g gac ggg gt ? gat ac 9 tct tea tec ggg aag taa 528 
Ala Arg Asp Glu Gin Asp GTy VaT Asp Thr Ser Ser Ser gTv lvs 

165 170 ' 175 • 

<210> 12 
<211> 175 
<212> PRT 

<213> Pantoea stewartii 
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<400> 12 

Met Leu Trp He Trp Asn Ala Leu He val Phe val Thr val val Gly 
15 10 15 

Met Glu val val Ala Ala Leu Ala His Lys Tyr lie Met His Gly Trp 
20 25 30 

Gly Trp Gly Trp His Leu ser His His Glu Pro Arg Lys Gly Ala Phe 
35 40 45 

Glu val Asn Asp Leu Tyr Ala Val val Phe Ala lie Val Ser lie Ala 
50 55 60 

Leu lie Tyr Phe Gly Ser Thr Gly He Trp Pro Leu Gin Trp lie Gly 
65 70 75 80 

Ala Gly Met Thr Ala Tyr Gly Leu Leu Tyr Phe Met val His Asp Gly 
85 90 95 

Leu val His Gin Arg Trp Pro Phe Arg Tyr lie Pro Arg Lys Gly Tyr 
100 105 110 

Leu Lys Arg Leu Tyr Met Ala His Arg Met His His Ala val Arg Gly 
115 120 125 

Lys Glu Gly Cys Val ser Phe Gly Phe Leu Tyr Ala Pro Pro Leu Ser 
130 135 140 

Lys Leu Gin Ala Thr Leu Arg Glu Arg His Ala Ala Arg Ser Gly Ala 
145 150 155 160 

Ala Arg Asp Glu Gin Asp Gly Val Asp Thr Ser Ser ser Gly Lys 
165 ; 170 175 

<210> 13 
<211> 25 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> First primer used to amplify carotenoid gene cluster. 
<400> 13 

atgacggtct gcgcaaaaaa acacg 25 

<210> 14 
<211> 28 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Second primer used to amplify carotenoid gene cluster. 
<400> 14 

gagaaattat gttgtggatt tggaatgc 28 
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<210> 15 

<211> 21 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Tn5PCRF. 

<400> 15 

gctgagttga aggatcagat c 



<210> 16 

<211> 21 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer TnSpCRR. 

<400> 16 

cgagcaagac gtttcccgtt g 



<210> 17 

<211> 25 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Kan-2 FP-1 

<400> 17 

acctacaaca aagctctcat caacc 



<210> 18 

<211> 25 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Kan-2 RP-1 

<400> 18 

gcaatgtaac atcagagatt ttgag 



<210> 19 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer YLF 

<400> 19 

agcaccatga tcatctggcg 



<210> 20 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 
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i 



21 



21 



25 



25 



20 
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<223> Primer Yl_R 



<400> 20 

cggttgcgct ggaagaaaac 



20 



<210> 21 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y4_F 

<400> 21 

caccctgtgc cattttcagc 20 



<210> 22 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y4_R 



<210> 23 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y8_1_F 

<400> 23 

aaagctaacc cgtggcagca 20 



<210> 24 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y8_1_R 



<210> 25 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Yl2_F 

<400> 25 

ttccgaaatg gcgtcagctc 20 



<400> 22 

cgttctgggt atggcccaga 



20 



<400> 24 

tttgcgttcc ccgaggcata 



20 



<210> 26 
<211> 25 
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<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y12_R 
<400> 26. 

atctctacat tgattatgag tattc 



<210> 27 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y15_F 

<400> 27 

ggatcgatct tgagatgacc 



<210> 28 

<211> 24 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Yl5_R 

<400> 28 

gctttcgtaa ttttcgcatt tctg 



<210> 29 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y16_F 

<400> 29 

cacgccaagt tgcgcaagta 



<210> 30 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Yl6_R 

<400> 30 

gcagaaaatg gtgactcagg 



<210> 31 

<211> 20 

<212> .DNA . 

<213> Artificial sequence 
<220> 

<223> Primer Yl7_F 

<400> 31 

ggcgatcctc gtcgatttct 
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25 



20 



24 



20 



20 



20 
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<210> 32 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y17_R 

<400> 32 

acgcagacga gagtttgcgt 20 



<210> 33 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y23_F 

<400> 33 

accgaatgcc cttgctgttg 



<210> 34 

<211> 20 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer Y2l_R 

<400> 34 

gggtgttcag gtatggctta 



<210> 35 
<211> 3159 
<212> DNA 

<213> Escherichia coli 
<400> 35 

atgcctgtta taactcttcc tgatggcagc caacgccatt acgatcacgc tgtaagcccc 60 

atggatgttg cgctggacat tggtccaggt ctggcgaaag cctgtatcgc agggcgcgtt 120 

aatggcgaac tggttgatgc ttgcgatctg attgaaaacg acgcacaact gtcgatcatt 180 

accgccaaag acgaagaagg tctggagatc attcgtcact cctgtgcgca cctgttaggg 240 

cacgcgatta aacaactttg gccgcatacc aaaatggcaa tcggcccggt tattgacaac 300 

ggtttttatt acgacgttga tcttgaccgc acgttaaccc aggaagatgt cgaagcactc 360 

gagaagcgga tgcatgagct tgctgagaaa aactacgacg tcattaagaa gaaagtcagc 420 

tggcacgaag cgcgtgaaac tttcgccaac cgtggggaga gctacaaagt ctccattctt 480 

gacgaaaaca tcgcccatga tgacaagcca ggtctgtact tccatgaaga atatgtcgat 540 

atgtgccgcg gtccgcacgt accgaacatg cgtttctgcc atcatttcaa actaatgaaa 600 

acggcagggg cttactggcg tggcgacagc aacaacaaaa tgttgcaacg tatttacggt 660 

acggcgtggg cagacaaaaa agcacttaac gcttacctgc agcgcctgga agaagccgcg 720 
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aaacgcgacc accgtaaaat cggtaaacag ctcgacctgt accatatgca ggaagaagcg 780 

ccgggtatgg tattctggca caacgacggc tggaccatct tccgtgaact ggaagtgttt 840 

gttcgttcta aactgaaaga gtaccagtat caggaagtta aaggtccgtt catgatggac 900 

cgtgtcctgt gggaaaaaac cggtcactgg gacaactaca aagatgcaat gttcaccaca 960 

tcttctgaga accgtgaata ctgcattaag ccgatgaact gcccgggtca cgtacaaatt 1020 

ttcaaccagg ggctgaagtc ttatcgcgat ctgccgctgc gtatggccga gtttggtagc 1080 

tgccaccgta acgagccgtc aggttcgctg catggcctga tgcgcgtgcg tggatttacc 1140 

caggatgacg cgcatatctt ctgtactgaa gaacaaattc gcgatgaagt taacggatgt 1200 

atccgtttag tctatgatat gtacagcact tttggcttcg agaagatcgt cgtcaaactc 1260 

tccactcgtc ctgaaaaacg tattggcagc gacgaaatgt gggatcgtgc tgaggcggac 1320 

ctggcggttg cgctggaaga aaacaacatc ccgtttgaat atcaactggg tgaaggcgct 1380 

ttctacggtc cgaaaattga atttaccctg tatgactgcc tcgatcgtgc atggcagtgc 1440 

ggtacagtac agctggactt ctctttgccg tctcgtctga gcgcttctta tgtaggcgaa 1500 

gacaatgaac gtaaagtacc ggtaatgatt caccgcgcaa ttctggggtc gatggaacgt 1560 

ttcatcggta tcctgaccga agagttcgct ggtttcttcc cgacctggct tgcgccggtt 1620 

caggttgtta tcatgaatat taccgattca cagtctgaat acgttaacga attgacgcaa 1680 

aaactatcaa atgcgggcat tcgtgttaaa gcagacttga gaaatgagaa gattggcttt 1740 

aaaatccgcg agcacacttt gcgtcgcgtc ccatatatgc tggtctgtgg tgataaagag 1800 

gtggaatcag gcaaagttgc cgttcgcacc cgccgtggta aagacctggg aagcatggac 1860 

gtaaatgaag tgatcgagaa gctgcaacaa gagattcgca gccgcagtct taaacctgtc 1920 

tcttatacac atctcaacca tcatcgatga attgtgtctc aaaatctctg atgttacatt 1980 

gcacaagata aaaatatatc atcatgaaca ataaaactgt ctgcttacat aaacagtaat 2040 

acaaggggtg ttatgagcca tattcaacgg gaaacgtctt gctcgaggcc gcgattaaat 2100 

tccaacatgg atgctgattt atatgggtat aaatgggctc gcgataatgt cgggcaatca 2160 

ggtgcgacaa tctatcgatt gtatgggaag cccgatgcgc cagagttgtt tctgaaacat 2220 

ggcaaaggta gcgttgccaa tgatgttaca gatgagatgg tcagactaaa ctggctgacg 2280 

gaatttatgc ctcttccgac catcaagcat tttatccgta ctcctgatga tgcatggtta 2340 

ctcaccactg cgatccccgg aaaaacagca ttccaggtat tagaagaata tcctgattca 2400 

ggtgaaaata ttgttgatgc gctggcagtg ttcctgcgcc ggttgcattc gattcctgtt 2460 

tgtaattgtc cttttaacag cgatcgcgta tttcgtctcg ctcaggcgca atcacgaatg 2520 

aataacggtt tggttgatgc gagtgatttt. gatgacgagc gtaatggctg gcctgttgaa 2580 

caagtctgga aagaaatgca taaacttttg ccattctcac cggattcagt cgtcactcat 2640 

ggtgatttct cacttgataa ccttattttt gacgagggga aattaatagg ttgtattgat 2700 

gttggacgag tcggaatcgc agaccgatac caggatcttg ccatcctatg gaactgcctc 2760 
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ggtgagtttt ctccttcatt acagaaacgg ctttttcaaa aatatggtat tgataatcct 2820 

gatatgaata aattgcagtt tcatttgatg ctcgatgagt ttttctaatc agaattggtt 2880 

aattggttgt aacactggca gagcattacg ctgacttgac gggacggcgg ctttgttgaa 2940 

taaatcgaac ttttgctgag ttgaaggatc agatcacgca tcttcccgac aacgcagacc 3000 

gttccgtggc aaagcaaaag ttcaaaatca ccaactggtc cacctacaac aaagctctca 3060 

tcaaccgtgg cggggatcct ctagagtcga cctgcaggca tgcaagcttc agggttgaga 3120 

tgtgtataag agacaggtct taaacaattg gaggaataa 3159 

<210> 36 
<211> 3171 
<212> DNA 

<213> Escherichia coli 
<400> 36 

atgatgagtt atgtagactg gccgccatta attttgaggc acacgtacta catggctgaa 60 

ttcgaaacca cttttgcaga tctgggcctg aaggctccta tccttgaagc ccttaacgat 120 

ctgggttacg aaaaaccatc tccaattcag gcagagtgta ttccacatct gctgaatggc 180 

cgcgacgttc tgggtatggc ccagacgggg agcggaaaaa ctgcagcatt ctctttacct 240 

ctgttgcaga atcttgatcc tgagctgaaa gcaccacaga ttctggtgct ggcaccgacc 300 

cgcgaactgg cggtacaggt tgctgaagca atgacggatt tctctaaaca catgcgcggc 360 

gtaaatgtgg ttgctctgta cggcggccag cgttatgacg tgcaattacg cgccctgcgt 420 

caggggccgc agatcgttgt cggtactccg ggccgtctgc tggaccacct gaaacgtggc 480 

actctggacc tctctaaact gagcggtctg gttctggatg aagctgacga aatgctgcgc 540 

atgggcttca tcgaagacgt tgaaaccatt atggcgcaga tcccggaagg tcatcagacc 600 

gctctgttct ctgcaaccat gccggaagcg attcgtcgca ttacccgccg ctttatgaaa 660 

gagccgcagg aagtgcgcat tcagtccagc gtgactaccc gtcctgacat cagccagagc 720 

tactggactg tctggggtat gcgcaaaaac gaagcactgg tacgctgtct cttatacaca 780 

tctcaaccat catcgatgaa ttgtgtctca aaatctctga tgttacattg cacaagataa 840 

aaatatatca tcatgaacaa taaaactgtc tgcttacata aacagtaata caaggggtgt 900 

tatgagccat attcaacggg aaacgtcttg ctcgaggccg cgattaaatt ccaacatgga 960 

tgctgattta tatgggtata aatgggctcg cgataatgtc gggcaatcag gtgcgacaat 1020 

ctatcgattg tatgggaagc ccgatgcgcc agagttgttt ctgaaacatg gcaaaggtag 1080 

cgttgccaat gatgttacag atgagatggt cagactaaac tggctgacgg aatttatgcc 1140 

tcttccgacc atcaagcatt ttatccgtac tcctgatgat gcatggttac tcaccactgc 1200 

gatccccgga aaaacagcat tccaggtatt agaagaatat cctgattcag gtgaaaatat 1260 

tgttgatgcg ctggcagtgt tcctgcgccg gttgcattcg attcctgttt gtaattgtcc 1320 

ttttaacagc gatcgcgtat ttcgtctcgc tcaggcgcaa tcacgaatga ataacggttt 1380 

ggttgatgcg agtgattttg atgacgagcg taatggctgg cctgttgaac aagtctggaa 1440 
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agaaatgcat aaacttttgc cattctcacc ggattcagtc gtcactcatg gtgatttctc 1500 

acttgataac cttatttttg acgaggggaa attaataggt tgtattgatg ttggacgagt 1560 

cggaatcgca gaccgatacc aggatcttgc catcctatgg aactgcctcg gtgagttttc 1620 

tccttcatta cagaaacggc tttttcaaaa atatggtatt gataatcctg atatgaataa 1680 

attgcagttt catttgatgc tcgatgagtt tttctaatca gaattggtta attggttgta 1740 

acactggcag agcattacgc tgacttgacg ggacggcggc tttgttgaat aaatcgaact 1800 

tttgctgagt tgaaggatca gatcacgcat cttcccgaca acgcagaccg ttccgtggca 1860 

aagcaaaagt tcaaaatcac caactggtcc acctacaaca aagctctcat caaccgtggc 1920 

ggggatcctc tagagtcgac ctgcaggcat gcaagcttca gggttgagat gtgtataaga 1980 

gacagactgg tacgtttcct ggaagcggaa gattttgatg cggcgattat cttcgttcgt 2040 

accaaaaacg cgactctgga agtggctgaa gctcttgagc gtaacggcta caacagcgcc 2100 

gcgctgaacg gtgacatgaa ccaggcgctg cgtgaacaga cactggaacg cctgaaagat 2160 

ggtcgtctgg acatcctgat tgcgaccgac gttgcagccc gtggcctgga cgttgagcgt 2220 

atcagcctgg tagttaacta cgatatcccg atggattctg agtcttacgt tcaccgtatc 2280 

ggtcgtaccg gtcgtgcggg tcgtgctggc cgcgcgctgc tgttcgttga gaaccgcgag 2340 

cgtcgtctgc tgcgcaacat tgaacgtact atgaagctga ctattccgga agtagaactg 2400 

ccgaacgcag aactgctagg caaacgccgt ctggaaaaat tcgccgctaa agtacagcag 2460 

cagctggaaa gcagcgatct ggatcaatac cgcgcactgc tgagcaaaat tcagccgact 2520 

gctgaaggtg aagagctgga tctcgaaact ctggctgcgg cactgctgaa aatggcacag 2580 

ggtgaacgta ctctgatcgt accgccagat gcgccgatgc gtccgaaacg tgaattccgt 2640 

gaccgtgatg accgtggtcc gcgcgatcgt aacgaccgtg gcccgcgtgg tgaccgtgaa 2700 

gatcgtccgc gtcgtgaacg tcgtgatgtt ggcgatatgc agctgtaccg cattgaagtg 2760 

ggccgcgatg atggtgttga agttcgtcat atcgttggtg cgattgctaa cgaaggcgac 2820 

atcagcagcc gttacattgg taacatcaag ctgtttgctt ctcactccac catcgaactg 2880 

ccgaaaggta tgccgggtga agtgctgcaa cactttacgc gcactcgcat tctcaacaag 2940 

ccgatgaaca tgcagttact gggcgatgca cagccgcata ctggcggtga gcgtcgtggc 3000 

ggtggtcgtg gtttcggtgg cgaacgtcgt gaaggcggtc gtaacttcag cggtgaacgc 3060 

cgtgaaggtg gccgtggtga tggtcgtcgt tttagcggcg aacgtcgtga aggccgcgct 3120 

ccgcgtcgtg atgattctac cggtcgtcgt cgtttcggtg gtgatgcgta a 3171 

<210> .37 

<211> 2904 
<212> DNA 

<213> Escherichia coli 
<400> 37 

atgactgaat cttttgctca actctttgaa gagtccttaa aagaaatcga aacccgcccg 60 
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ggttctatcg ttcgtggcgt tgttgttgct atcgacaaag acgtagtact ggttgacgct 120 

ggtctgaaat ctgagtccgc catcccggct gagcagttca aaaacgccca gggcgagctg 180 

gaaatccagg taggtgacga agttgacgtt gctctggacg cagtagaaga cggcttcggt 240 

gaaactctgc tgtcccgtga gaaagctaaa cgtcacgaag cctggatcac gctggaaaaa 300 

gcttacgaag atgctgaaac tgttaccggt gttatcaacg gcaaagttaa gggcggcttc 360 

actgttgagc tgaacggtat tcgtgcgttc ctgccaggtt ctctggtaga cgttcgtccg 420 

gtgcgtgaca ctctgcacct ggaaggcaaa gagcttgaat ttaaagtaat caagctggat 480 

cagaagcgca acaacgttgt tgtttctcgt cgtgccgtta tcgaatccga aaacagcgca 540 

gagcgcgatc agctgctgga aaacctgcag gaaggcatgg aagttaaagg tatcgttaag 600 

aacctcactg actacggtgc attcgttgat ctgggcggcg ttgacggcct gctgcacatc 660 

actgacatgg cctggaaacg cgttaagcat ccgagcgaaa tcgtcaacgt gggcgacgaa 720 

atcactgtta aagtgctgaa gttcgaccgc gaacgtaccc gtgtatccct gggcctgaaa 780 

cagctgggcg aagatccgtg ggtagctatc gctaaacgtt atccggaagg taccaaactg 840 

actggtcgcg tgaccaacct gaccgactac ggctgcttcg ttgaaatcga agaaggcgtt 900 

gaaggcctgg tacacgtttc cgaaatggac tggaccaaca aaaacatcca cccgtccaaa 960 

gttgttaacg ttggcgatgt agtggaagtt atggttctgg atatcgacga agaacgtcgt 1020 

cgtatctccc tgggtctgaa acagtgcaaa gctaacccgt ggcagcagtt cgcggaaacc 1080 

cacaacaagg gcgaccgtgt tgaaggtaaa atcaagtcta tcactgactt cggtatcttc 1140 

atcggcttgg acggcggcat cgacggcctg gttcacctgt ctgacatctc ctggaacgtt 1200 

gcaggcgaag aagcagttcg tgaatacaaa aaaggcgacg aaatcgctgc agttgttctg 1260 

caggttgacg cagaacgtga acgtatctcc ctgggcgtta aacagctcgc agaagatccg 1320 

ttcaacaact gggttgctct gaacaagaaa ggcgctatcg taaccggtaa agtaactgca 1380 

gttgacgcta aaggcgcaac cgtagaactg gctgacggcg ttgaaggtta cctgcgtgct 1440 

tctgaagcat cccgtgaccg cgttgaagac gctaccctgg ttctgagcgt tggcgacgaa 1500 

gttgaagcta aattcaccgg cgttgatcgt aaaaaccgcg caatcagcct gtctgttcgt 1560 

gcgaaagacg aagctgacga gaaagatgca atcgcaactg tctcttatac acatctcaac 1620 

cctgaagctt gcatgcctgc aggtcgactc tagaggatcc ccgccacggt tgatgagagc 1680 

tttgttgtag gtggaccagt tggtgatttt gaacttttgc tttgccacgg aacggtctgc 1740 

gttgtcggga agatgcgtga tctgatcctt caactcagca aaagttcgat ttattcaaca 1800 

aagccgccgt cccgtcaagt cagcgtaatg ctctgccagt gttacaacca attaaccaat 1860 

tctgattaga aaaactcatc gagcatcaaa tgaaactgca atttattcat atcaggatta 1920 

tcaataccat atttttgaaa aagccgtttc tgtaatgaag gagaaaactc accgaggcag 1980 

ttccatagga tggcaagatc ctggtatcgg tctgcgattc cgactcgtcc aacatcaata 2040 

caacctatta atttcccctc gtcaaaaata aggttatcaa gtgagaaatc accatgagtg 2100 
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acgactgaat 


ccggtgagaa 


CL-2028PCT.ST25.txt 
tggcaaaagt ttatgcattt ctttccagac 


ttgttcaaca 




ggccagccat 


tacgctcgtc 


atcaaaatca 


ctcgcatcaa 


ccaaaccgtt 


attcattcg t 




gattgcgcct 


gagcgagacg 


aaatacgcga 


tcgctgttaa 


aaggacaatt 


acaaacagga 




atcgaatgca 


accggcgcag 


gaacactgcc 


agcgcatcaa 


caatattttc 


acctgaatca 




ggatattctt 


ctaatacctg 


gaatgctgtt 


tttccgggga 


tcgcagtggt 


gagtaaccat 


*4U0 


gcatcatcag 


gagtacggat 


aaaatgcttg 


atggtcggaa 


gaggcataaa 


ttccgtcagc 


Ovl CA 

£4b0 


cagtttagtc 


tgaccatctc 


atctgtaaca 


tcattggcaa 


cgctaccttt 


gccatgtttc 




agaaacaact 


ctggcgcatc 


gggcttccca 


tacaatcgat 


agattgtcgc 


acctgattgc 


icon 

2580 


ccgacattat 


cgcgagccca 


tttataccca 


tataaatcag 


catccatgtt 


ggaatttaat 


2640 


cacaacctca 


aocaaaacat 


ttcccgttga 


atatggctca 


taacacccct 


tgtattactg 


2700 


tttatgtaag 


cagacagttt 


tattgttcat 


gatgatatat 


ttttatcttg 


tgcaatgtaa 


2760 


catcagagat 


tttgagacac 


aattcatcga 


tgatggttga 


gatgtgtata 


agagacagca 


2820 


atcgcaactg 


ttaacaaaca 


ggaagatgca 




araarnraa'f 
cH_d.cH_yL.act L 


ggctgaagct 


2880 


ttcaaagcag 


ctaaaggcga 


gtaa 








2904 


<210> 38 
<211> 5454 
<212> DNA 

<213> Escherichia coli 










<400> 38 
gtgaaagatt 


tattaaagtt tctgaaagcg 


cagactaaaa 


ccgaagagtt 


tgatgcgatc 


60 


aaaattgctc 


tggcttcgcc agacatgatc 


cgttcatggt 


ctttcggtga 


agttaaaaag 


120 


ccggaaacca 


tcaactaccg tacgttcaaa 


ccagaacgtg 


acggcctttt 


ctgcgcccgt 


180 


atctttgggc 


cggtaaaaga ttacgagtgc 


ctgtgcggta 


agtacaagcg 


cctgaaacac 


240 


cgtggcgtca 


tctgtgagaa gtgcggcgtt 


gaagtgaccc 


agactaaagt 


acgccgtgag 


300 


cgtatgggcc 


acatcgaact ggcttccccg 


actgcgcaca 


tctggttcct 


gaaatcgctg 


360 


ccgtcccgta 


tcggtctgct gctcgatatg 


ccgctgcgcg 


atatcgaacg 


cgtactgtac 


420 


tttgaatcct 


atgtggttat cgaaggcggt 


atgaccaacc 


tggaacgtca 


gcagatcctg 


480 


actgaagagc 


agtatctgga cgcgctggaa 


gagttcggtg 


acgaattcga 


cgcgaagatg 


540 


ggggcggaag 


caatccaggc tctgctgaag 


agcatggatc 


tggagcaaga 


gtgcgaacag 


600 


ctgcgtgaag 


agctgaacga aaccaactcc 


gaaaccaagc 


gtaaaaagct 


gaccaagcgt 


660 


atcaaactgc 


tggaagcgtt cgttcagtct 


ggtaacaaac 


cagagtggat 


gatcctgacc 


720 


gttctgccgg 


tactgccgcc agatctgcgt 


ccgctggttc 


cgctggatgg 


tggtcgtttc 


780 


gcgacttctg 


acctgaacga tctgtatcgf 


cgcgtcatta 


accgtaacaa 


ccgtctgaaa 


840 


cgtctgctgg 


atctggctgc gccggacatc 


atcgtacgta 


acgaaaaacg 


tatgctgcag 


900 


gaagcggtag 


acgccctgct ggataacggt 


cgtcgcggtc 


gtgcgatcac 


cggttctaac 


960 


aagcgtcctc 


tgaaatcttt ggccgacatg 


atcaaaggta aacagggtcg 
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aacctgctcg 


gtaagcgtgt 


tgactactcc 


ggtcgttctg 


taatcaccgt 


aggtccatac 


1080 


ctgcgtctgc 


atcagtgcgg 


tctgccgaag 


aaaatggcac 


tggagctgtt 


caaaccgttc 


1140 


atctacggca 


agctggaact 


gcgtggtctt 


gctaccacca 


ttaaagctgc 


gaagaaaatg 


1200 


gttgagcgcg 


aagaagctgt 


cgtttgggat 


atcctggacg 


aagttatccg 


cgaacacccg 


1260 


gtactgctga 


accgtgcacc 


gactctgcac 


cgtctgggta 


tccaggcatt 


tgaaccggta 


1320 


ctgatcgaag 


gtaaagctat 


ccagctgcac 


ccgctggttt 


gtgcggcata 


taacgccgac 


1380 


ttcgatggtg 


accagatggc 


tgttcacgta 


ccgctgacgc 


tggaagccca 


gctggaagcg 


1440 


cgtgcgctga 


tgatgtctac 


caacaacatc 


ctgtccccgg 


cgaacggcga 


accaatcatc 


1500 


gttccgtctc 


aggacgttgt 


actgggtctg 


tactacatga 


cccgtgactg 


tgttaacgcc 


1560 


aaaggcgaag 


gcatggtgct 


gactggcccg 


aaagaagcag 


aacgtctgta 


tcgctctggt 


1620 


ctggcttctc 


tgcatgcgcg 


cgttaaagtg 


cgtatcaccg 


agtatgaaaa 


agatgctaac 


1680 


ggtgaattag 


tagcgaaaac 


cagcctgaaa 


gacacgactg 


ttggccgtgc 


cattctgtgg 


1740 


atgattgtac 


cgaaaggtct 


gccttactcc 


atcgtcaacc 


aggcgctggg 


taaaaaagca 


1800 


atctccaaaa 


tgctgaacac 


ctgctaccgc 


attctcggtc 


tgaaaccgac 


cgttattttt 


1860 


gcggaccaga 


tcatgtacac 


cggcttcgcc 


tatgcagcgc 


gttctggtgc 


atctgttggt 


1920 


atcgatgaca 


tggtcatccc 


ggagaagaaa 


cacgaaatca 


tctccgaggc 


agaagcagaa 


1980 


gttgctgaaa 


ttcaggagca 


gttccagtct 


ggtctggtaa 


ctgcgggcga 


acgctacaac 


2040 


aaagttatcg 


atatctgggc 


tgcggcgaac 


gatcgtgtat 


ccaaagcgat 


gatggataac 


2100 


ctgcaaactg 


aaaccgtgat 


taaccgtgac 


ggtcaggaag 


agaagcaggt 


ttccttcaac 


2160 


agcatctaca 


tgatggccga 


ctccggtgcg 


cgtggttctg 


cggcacagat 


tcgtcagctt 


2220 


gctggtatgc 


gtggtctgat 


ggcgaagccg 


gatggctcca 


tcatcgaaac 


gccaatcacc 


2280 


gcgaacttcc 


gtgaaggtct 


gaacgtactc 


cagtacttca 


tctccaccca 


cggtgctcgt 


2340 


aaaggtctgg 


cggataccgc 


actgaaaact 


gcgaactccg 


gttacctgac 


tcgtcgtctg 


2400 


gttgacgtgg 


cgcaggacct 


ggtggttacc 


gaagacgatt 


gtggtaccca 


tgaaggtatc 


2460 


atgatgactc 


cggttatcga 


gggtggtgac 


gttaaagagc 


cgctgcgcga 


tcgcgtactg 




ggtcgtgtaa 


ctgctgaaga 


cgttctgaag 


ccgggtactg 


ctgatatcct 


cgttccgcgc 


2580 


aacacgctgc 


tgcacgaaca 


gtggtgtgac 


ctgctggaag 


agaactctgt 


cgacgcggtt 


2640 


aaagtacgtt 


ctgttgtatc 


ttgtgacacc 


gactttggtg 


tatgtgcgca 


ctgctacggt 


2700 


cgtgacctgg 


cgcgtggcca 


catcatcaac 


aagggtgaag 


caatcggtgt 


tatrnrnnra 

u a l ty eg y ta. 


LI DU 


cagtccatcg 


gtgaaccggg 


tacacagctg 


accatgcgta 


cgttccacat 


cggtggtgcg 


2820 


gcatctcgtg 


cggctgctga atccagcatc 


caagtgaaaa 


acaaaggtag 


catcaagctc 


2880 


agcaacgtga 


agtcggttgt gaactccagc 


ggtaaactgg 


ttatcacttc 


cegtaatact 


2940 


gaactgaaac 


tgatcgacga attcggtcgt 


actaaagaaa 


gctacaaagt 


accttaeggt 


3000 


gcggtactgg 


cgaaaggcga tggcgaacag 


gttgctggcg gcgaaaccgt 
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gacccgcaca 


ccatgccggt 


tatcaccgaa 


gtaagcggtt 


ttgtacgctt 


tactgacatg 


3120 


atcgacggcc 


agaccattac 


gcgtcagacc 


gacgaactga 


ccggtctgtc 


ttcgctggtg 


3180 


gttctggatt 


ccgcagaacg 


taccgcaggt 


ggtaaagatc 


tgcgtccggc 


actgaaaatc 


3240 


gttgatgctc 


agggtaacga 


cgttctgatc 


ccaggtaccg 


atatgccagc 


gcagtacttc 


3300 


ctgccgggta 


aagcgattgt 


tcagctggaa 


gatggcgtac 


agatcagctc 


tggtgacacc 


3360 


ctggcgcgta 


ttccgcagga 


atccggcggt 


accaaggaca 


tcaccggtgg 


tctgccgcgc 


3420 


gttgcggacc 


tgttcgaagc 


acgtcgtccg 


aaagagccgg 


caatcctggc 


tgaaatcagc 


3480 


ggtatcgttt 


ccttcggtaa 


agaaaccaaa 


ggtaaacgtc 


gtctggttat 


caccccggta 


3540 


gacggtagcg 


atccgtacga 


agagatgatt 


ccgaaatggc 


gtcagctcaa 


cgtgttcgaa 


3600 


ggtgaacgtg 


tagaacgtgg 


tgacgtaatt 


tccgacggtc 


cggaagcgcc 


gcacgacatt 


3660 


ctgcgtctgc 


gtggtgttca 


tgctgttact 


cgttacatcg 


ttaacgaagt 


acaggacgta 


3720 


taccgtctgc 


agggcgttaa 


gattaacgat 


aaacacatcg 


aagttatcgt 


tcgtcagatg 


3780 


ctgcgtaaag 


ctaccatcgt 


taacgcgggt 


agctccgact 


tcctggaagg 


cgaacaggtt 


3840 


gaatactctc 


gcgtcaagat 


cgcaaaccgc 


gaactggaag 


cgaacggcaa 


agtgggtgca 


3900 


acttactccc 


gcgatctgct 


gggtatcacc 


aaagcgtctc 


tggcaaccga 


gtccttcatc 


3960 


tccgcggcat 


cgttccagga 


gaccactcgc 


gtgctgaccg 


aagcagccgt 


tgcgggcaaa 


4020 


cgcgacgaac 


tgcgcggcct 


gaaagagaac 


gttatcgtgg 


gtcgtctgat 


cccggcaggt 


4080 


accggttacg 


cgtaccacca 


ggatcgtatg 


cgtcgccgtg 


ctgcgggtga 


agctctgtct 


4140 


cttatacaca 


tctcaaccct 


gaagcttgca 


tgcctgcagg 


tcgactctag 


aggatccccg 


4200 


ccacggttga. 


tgagagcttt 


gttgtaggtg 


gaccagttgg 


tgattttgaa 


cttttgcttt 


4260 


gccacggaac 


ggtctgcgtt 


gtcgggaaga 


tgcgtgatct 


gatccttcaa 


ctcagcaaaa 


4320 


gttcgattta 


ttcaacaaag 


ccgccgtccc 


gtcaagtcag 


cgtaatgctc 


tgccagtgtt 


4380 


acaaccaatt 


aaccaattct 


gattagaaaa 


actcatcgag 


catcaaatga 


aactgcaatt 


4440 


tattcatatc 


aggattatca 


ataccatatt 


tttgaaaaag 


ccgtttctgt 


aatgaaggag 


4500 


aaaactcacc 


gaggcagttc 


cataggatgg 


caagatcctg 


gtatcggtct 


gcgattccga 


4560 


ctcgtccaac 


atcaatacaa 


cctattaatt 


tcccctcgtc 


aaaaataagg 


ttatcaagtg 


4620 


agaaatcacc 


atgagtgacg 


actgaatccg 


gtgagaatgg 


caaaagttta 


tgcatttctt 


4680 


tccagacttg 


ttcaacaggc 


cagccattac 


gctcgtcatc 


aaaatcactc 


gcatcaacca 


4740 


aarrntfatt 
aov-^y L La. L L 


r a +■ 1" r fi 1" n a "t* 


"tnrnr <■* Irian 

Ly LyLL Lycly 


cgagacgaaa 


tacgcgatcg 


rlrj't'taaaaa 


4800 


gacaattaca 


aacaggaatc 


gaatgcaacc 


ggcgcaggaa 


cactgccagc 


gcatcaacaa 


4860 


tattttcacc 


tgaatcagga 


tattcttcta 


atacctggaa 


tgctgttttt 


ccggggatcg 


4920 


cagtggtgag 


taaccatgca 


tcatcaggag 


tacggataaa 


atgcttgatg 


gtcggaagag 


4980 


gcataaattc 


cgtcagccag 


tttagtctga 


ccatctcatc 


tgtaacatca 


ttggcaacgc 


5040 


tacctttgcc 


atgtttcaga 


aacaactctg 


gcgcatcggg cttcccatac 
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ttgtcgcacc tgattgcccg acattatcgc gagcccattt atacccatat aaatcagcat 5160 

ccatgttgga atttaatcgc ggcctcgagc aagacgtttc ccgttgaata tggctcataa 5220 

caccccttgt attactgttt atgtaagcag acagttttat tgttcatgat gatatatttt 5280 

tatcttgtgc aatgtaacat cagagatttt gagacacaat tcatcgatga tggttgagat 5340 

gtgtataaga gacagggtga agctccggct gcaccgcagg tgactgcaga agacgcatct 5400 

gccagcctgg cagaactgct gaacgcaggt ctgggcggtt ctgataacga gtaa 5454 

<210> 39 
<211> 1845 
<212> DNA 

<213> Escherichia coli 
<400> 39 

atgggcaaaa catctatgat acacgcaatt gtggatcaat atagtcactg tgaatgggtg 60 

gaaaatagca tgagtgccaa tgaaaacaac ctgatttgga tcgatcttga gatgaccggt 120 

ctggatcccg agcgcgatcg cattattgag attgccacgc tggtgaccga tgccaacctg 180 

aatattctgg cagaagggcc gaccattgca gtacaccagt ctgatgaaca gctggcgctg 240 

atggatgact ggaacgtgcg cacccatacc gccagcgggc tggtagagcg cgtgaaagcg 300 

agcacgatgg gcgatcggga agctgaactg gcaacgctcg aatttttaaa acagtgggtg 360 

cctgcgggaa aatcgccgat ttgcggtaac agcatcggtc aggaccgtcg tttcctgttt 420 

aaatacatgc cggagctgga agcctacttc cactaccgtt atctcgatgt cagcaccctg 480 

aaagagctgg cgcgccgctg gaagccggaa attctggatg gttttaccaa gcaggggacg 540 

catcaggcga tggatgatat ccgtgaatcg gtggcggagc tggcttacta cctgtctctt 600 

atacacatct caaccctgaa gcttgcatgc ctgcaggtcg actctagagg atccccgcca 660 

cggttgatga gagctttgtt gtaggtggac cagttggtga ttttgaactt ttgctttgcc 720 

acggaacggt ctgcgttgtc gggaagatgc gtgatctgat ccttcaactc agcaaaagtt 780 

cgatttattc aacaaagccg ccgtcccgtc aagtcagcgt aatgctctgc cagtgttaca 840 

accaattaac caattctgat tagaaaaact catcgagcat caaatgaaac tgcaatttat . 900 

tcatatcagg attatcaata ccatattttt gaaaaagccg tttctgtaat gaaggagaaa 960 

actcaccgag gcagttccat aggatggcaa gatcctggta tcggtctgcg attccgactc 1020 

gtccaacatc aatacaacct attaatttcc cctcgtcaaa aataaggtta tcaagtgaga 1080 

aatcaccatg agtgacgact gaatccggtg agaatggcaa aagtttatgc atttctttcc 1140 

agacttgttc aacaggccag ccattacgct cgtcatcaaa atcactcgca tcaaccaaac 1200 

cgttattcat tcgtgattgc gcctgagcga gacgaaatac gcgatcgctg ttaaaaggac 1260 

aattacaaac aggaatcgaa tgcaaccggc gcaggaacac tgccagcgca tcaacaatat 1320 

tttcacctga atcaggatat tcttctaata cctggaatgc tgtttttccg gggatcgcag 1380 

tggtgagtaa ccatgcatca tcaggagtac ggataaaatg cttgatggtc ggaagaggca 1440 
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taaattccgt cagccagttt agtctgacca tctcatctgt aacatcattg gcaacgctac 1500 

ctttgccatg tttcagaaac aactctggcg catcgggctt cccatacaat cgatagattg 1560 

tcgcacctga ttgcccgaca ttatcgcgag cccatttata cccatataaa tcagcatcca 1620 

tgttggaatt taatcgcggc ctcgagcaag acgtttcccg ttgaatatgg ctcataacac 1680 

cccttgtatt actgtttatg taagcagaca gttttattgt tcatgatgat atatttttat 1740 

cttgtgcaat gtaacatcag agattttgag acacaattca tcgatgatgg ttgagatgtg 1800 

tataagagac aggcttacta ccgcgagcat tttatcaagc tgtaa 1845 

<210> 40 
<211> 2334 
<212> DNA 

<213> Escherichia coli 
<400> 40 

atgaagccaa tttttagccg tggcccgtcg ctacagattc gccttattct ggcggtgctg 60 

gtggcgctcg gcattattat tgccgacagc cgcctgggga cgttcagtca aatccgtact 120 

tatatggata ccgccgtcag tcctttctac tttgtttcca atgctcctcg tgaattgctg 180 

gatggcgtat cgcagacgct ggcctcgcgt gaccaattag aacttgaaaa ccgggcgtta 240 

cgtcaggaac tgttgctgaa aaacagtgaa ctgctgatgc ttggacaata caaacaggag 300 

aacgcgcgtc tgcgcgagct gctgggttcc ccgctgcgtc aggatgagca gaaaatggtg 360 

actcaggtta tctccacggt taacgatcct tatagcgatc aagttgttat cgataaaggt 420 

agcgttaatg gcgtttatga aggccagccg gtcatcagcg acaaaggtgt tgttggtcag 480 

gtggtggccg tcgctaaact gaccagtcgc gtgctgctga tttgtgatgc gacccacgcg 540 

ctgccaatcc aggtgctgcg caacgatatc cgcgtaattg cagccggtaa cggttgtacg 600 

gatgatttgc agcttgagca tctgccggcg aatacggata ttcgtgttgg tgatgtgctg 660 

gtgacttccg gtctgggcgg tcgtttcccg gaaggctatc. cggtcgcggt tgtctcttcc 720 

gtaaaactcg atacccagcg cgcttatact gtgattcagg cgcgtccgac tgcagggctg 780 

caacgtttgc gttatctgct gctgctgtgg ggggcagatc gtaacggcgc taacccgatg 840 

acgccggaag aggtgcatcg tgttgctaat gaacgtctga tgcagatgat gccgcaggta 900 

ttgccttcgc cagacgcgat ggggccaaag ttacctgaac cggcaacggg gatcgctcag 960 

ccgactccgc agcaaccggc gacaggaaat gcagctactg cgcctgctgc gccgacacag 1020 

cctctgtctc ttatacacat ctcaaccatc atcgatgaat tgtgtctcaa aatctctgat 1080 

gttacattgc acaagataaa aatatatcat catgaacaat aaaactgtct gcttacataa 1140 

acagtaatac aaggggtgtt atgagccata ttcaacggga aacgtcttgc tcgaggccgc 1200 

gattaaattc caacatggat gctgatttat atgggtataa atgggctcgc gataatgtcg 1260 

ggcaatcagg tgcgacaatc tatcgattgt atgggaagcc cgatgcgcca gagttgtttc 1320 

tgaaacatgg caaaggtagc gttgccaatg atgttacaga tgagatggtc agactaaact 1380 

ggctgacgga atttatgcct cttccgacca tcaagcattt tatccgtact cctgatgatg 1440 
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catggttact 


caccactgcg 


atccccggaa aaacagcatt ccaggtatta 


gaagaatatc 


1500 


ctgattcagg 


tgaaaatatt 


gttgatgcgc tggcagtgtt cctgcgccgg 


ttgcattcga 


1560 


ttcctgtttg 


taattgtcct 


tttaacagcg atcgcgtatt tcgtctcgct 


caggcgcaat 


1620 


cacgaatgaa 


taacggtttg 


gttgatgcga gtgattttga tgacgagcgt 


aatggctggc 


i con 

1680 


ctgttgaaca 


agtctggaaa 


gaaatgcata aacttttgcc attctcaccg 


gattcagtcg 


1740 


tcactcatgg 


tgatttctca 


cttgataacc ttatttttga cgaggggaaa 


ttaataggtt 


1800 


gtattgatgt 


tggacgagtc 


ggaatcgcag accgatacca ggatcttgcc 


atcctatgga 


1860 


actgcctcgg 


tgagttttct 


ccttcattac agaaacggct ttttcaaaaa 


tatggtattg 


1920 


ataatcctga 


tatgaataaa 


ttgcagtttc atttgatgct cgatgagttt 


ttctaatcag 


1980 


aattnnttJia 
act U y L Laa 


1" 1" n n1" t* nt* a a 
i Ly y l i,y uoq 


rarl"nnraaa ara"tt:acflcl! aact!"taacaa 


□acaacaact 


2040 


ttgttgaata 


aatcgaactt 


ttgctgagtt gaaggatcag atcacgcatc 


ttcccgacaa 


2100 


cgcagaccgt 


tccgtggcaa 


agcaaaagtt caaaatcacc aactggtcca 


cctacaacaa 


2160 


agctctcatc 


aaccgtggcg 


gggatcctct agagtcgacc tgcaggcatg 


caagcttcag 


2220 


ggttgagatg 


tgtataagag 


acagacacag cctgctgcta atcgctctcc 


acaaagggct 


2280 


acgccgccgc 


aaagtggtgc 


tcaaccgcct gcgcgtgcgc cgggagggca 


atag 


2334 


<210> 41 
<211> 2676 
<212> DNA 

<213> Escherichia coli 






<400> 41 
atgcgaagtg 


aacagatttc 


tggctcgtca ctcaatccgt cttgtcgttt 


cagttcctgt 


60' 


ctcttataca 


catctcaacc 


atcatcgatg aattgtgtct caaaatctct 


gatgttacat 


120; 


tgcacaagat 


aaaaatatat 


catcatgaac aataaaactg tctgcttaca 


taaacagtaa 


180 


tacaaggggt 


gttatgagcc 


atattcaacg ggaaacgtct tgctcgaggc 


cgcgattaaa 


240 


ttccaacatg 


gatgctgatt 


tatatgggta taaatgggct cgcgataatg 


tcgggcaatc 


300 


aggtgcgaca 


atctatcgat 


tgtatgggaa gcccgatgcg ccagagttgt 


ttctgaaaca 


360 


tggcaaaggt 


agcgttgcca 


atgatgttac agatgagatg gtcagactaa 


actggctgac 


420 


ggaatttatg 


cctcttccga 


ccatcaagca ttttatccgt actcctgatg 


atgcatggtt 


480 


actcaccact 


gcgatccccg 


gaaaaacagc attccaggta ttagaagaat 


atcctgattc 


540 


aggtgaaaat 


attgttgatg 


cgctggcagt gttcctgcgc cggttgcatt 


cgattcctgt 


600 


ttgtaattgt 


ccttttaaca 


gcgatcgcgt atttcgtctc gctcaggcgc 


aatcacgaat 


660 


gaataacggt 


ttggttgatg 


cgagtgattt tgatgacgag cgtaatggct 


ggcctgttga 


720 


acaagtctgg 


aaagaaatgc 


ataaactttt gccattctca ccggattcag 


tcgtcactca 


780 


tggtgatttc 


tcacttgata 


accttatttt tgacgagggg aaattaatag 


gttgtattga 


840 


tgttggacga 


gtcggaatcg 


cagaccgata ccaggatctt gccatcctat 


ggaactgcct 


900 
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cggtgagttt 


tctccttcat 


tacagaaacg 


gctttttcaa aaatatggta ttgataatcc 


960 


tgatatgaat 


aaattgcagt 


ttcatttgat 


gctcgatgag tttttctaat cagaattggt 


1020 


taattggttg 


taacactggc 


agagcattac 


gctgacttga cgggacggcg gctttgttga 


1080 


ataaatcgaa 


cttttgctga 


gttgaaggat 


cagatcacgc atcttcccga caacgcagac 


1140 


cgttccgtgg 


caaagcaaaa 


gttcaaaatc 


accaactggt ccacctacaa caaagctctc 


1200 


atcaaccgtg 


gcggggatcc 


tctagagtcg 


acctgcaggc atgcaagctt cagggttgag 


1260 


atgtgtataa 


gagacagttt 


cagttctgcg 


tactctcctg tgaccaggca gcgaaaagac 


1320 


atgagtcgat 


gaccgtaaac 


aggcatggat 


gatcctgcca taccattcac aacattaagt 


1380 


tcgagattta 


ccccaagttt 


aagaactcac 


accactatga atcttaccga attaaagaat 


1440 


acgccggttt 


ctgagctgat 


cactctcggc 


gaaaatatgg ggctggaaaa cctggctcgt 


1500 


atgcgtaagc 


aggacattat 


ttttgccatc 


ctgaagcagc acgcaaagag tggcgaagat 


1560 


atctttggtg 


atggcgtact 


ggagatattg 


caggatggat ttggtttcct ccgttccgca 


1620 


gacagctcct 


acctcgccgg 


tcctgatgac 


atctacgttt cccctagcca aatccgccgt 


1680 


ttcaacctcc 


gcactggtga 


taccatctct 


ggtaagattc gcccgccgaa agaaggtgaa 


1740 


cgctattttg 


cgctgctgaa 


agttaacgaa 


gttaacttcg acaaacctga aaacgcccgc 


1800 


aacaaaatcc 


tctttgagaa 


cttaaccccg 


ctgcacgcaa actctcgtct gcgtatggaa 


1860 


cgtggtaacg 


gttctactga 


agatttaact 


gctcgcgtac tggatctggc atcacctatc 


1920 


ggtcgtggtc 


agcgtggtct 


gattgtggca 


ccgccgaaag ccggtaaaac catgctgctg 


1980 


cagaacattg 


ctcagagcat 


tgcttacaac 


cacccggatt gtgtgctgat ggttctgctg 


2040 


atcgacgaac 


gtccggaaga 


agtaaccgag 


atgcagcgtc tggtaaaagg tgaagttgtt 


2100 


gcttctacct 


ttgacgaacc 


cgcatctcgc 


cacgttcagg ttgcggaaat ggtgatcgag 


2160 


aaggccaaac 


gcctggttga 


gcacaagaaa 


gacgttatca ttctgctcga ctccatcact 


2220 


cgtctggcgc 


gcgcttacaa 


caccgttgtt 


ccggcgtcag gtaaagtgtt gaccggtggt 


2280 


gtggatgcca 


acgccctgca 


tcgtccgaaa 


cgcttctttg gtgcggcgcg taacgtggaa 


2340 


gagggcggca 


gcctgaccat 


tatcgcgacg 


gcgcttatcg ataccggttc taaaatggac 


2400 


gaagttatct 


acgaagagtt 


taaaggtaca 


ggcaacatgg aactgcacct ctctcgtaag 


2460 


atcgctgaaa 


aacgcgtctt 


cccggctatc 


gactacaacc gttctggtac ccgtaaagaa 


2520 


gagctgctca 


cgactcagga 


agaactgcag 


aaaatgtgga tcctgcgcaa aatcattcac 


2580 


ccgatgggcg 


aaatcgatgc 


aatggaattc 


ctcattaata aactggcaat gaccaagacc 


. 2640 


aatgacgatt 


tcttcgaaat 


gatgaaacgc 


tcataa 


2676 


<21d>~ 42 ; 
<211> 1746 
<212> DNA 









<213> Escherichia coli 
<400> 42 

atggattact tcaccctctt tggcttgcct gcccgctatc aactcgatac ccaggcgctg 60 
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agcctgcgtt ttcaggatct acaacgtcag tatcatcctg ataaattcgc cagcggaagc 120 

caggcggaac aactcgccgc cgtacagcaa tctgcaacca ttaaccaggc ctggcaaacg 180 

ctgcgtcatc cgttaatgcg cgcggaatat ttgctttctt tgcacggctt tgatctcgcc 240 

agcgagcagc atacctgtct cttatacaca tctcaaccat catcgatgaa ttgtgtctca 300 

aaatctctga tgttacattg cacaagataa aaatatatca tcatgaacaa taaaactgtc 360 

tgcttacata aacagtaata caaggggtgt tatgagccat attcaacggg aaacgtcttg 420 

ctcgaggccg cgattaaatt ccaacatgga tgctgattta tatgggtata aatgggctcg 480 

cgataatgtc gggcaatcag gtgcgacaat ctatcgattg tatgggaagc ccgatgcgcc 540 

agagttgttt ctgaaacatg gcaaaggtag cgttgccaat gatgttacag atgagatggt 600 

cagactaaac tggctgacgg aatttatgcc tcttccgacc atcaagcatt ttatccgtac 660 

tcctgatgat gcatggttac tcaccactgc gatccccgga aaaacagcat tccaggtatt 720 

agaagaatat cctgattcag gtgaaaatat tgttgatgcg ctggcagtgt tcctgcgccg 780 

gttgcattcg attcctgttt gtaattgtcc ttttaacagc gatcgcgtat ttcgtctcgc 840 

tcaggcgcaa tcacgaatga ataacggttt ggttgatgcg agtgattttg atgacgagcg 900 

taatggctgg cctgttgaac aagtctggaa agaaatgcat aaacttttgc cattctcacc 960 

ggattcagtc gtcactcatg gtgatttctc acttgataac cttatttttg acgaggggaa 1020 

attaataggt tgtattgatg ttggacgagt cggaatcgca gaccgatacc aggatcttgc 1080 

catcctatgg aactgcctcg gtgagttttc tccttcatta cagaaacggc tttttcaaaa 1140 

atatggtatt gataatcctg atatgaataa attgcagttt catttgatgc tcgatgagtt 1200 

tttctaatca gaattggtta attggttgta acactggcag agcattacgc tgacttgacg 1260 

ggacggcggc tttgttgaat aaatcgaact tttgctgagt tgaaggatca gatcacgcat 1320 

cttcccgaca acgcagaccg ttccgtggca aagcaaaagt tcaaaatcac caactggtcc 1380 

acctacaaca aagctctcat caaccgtggc ggggatcctc tagagtcgac ctgcaggcat 1440 

gcaagcttca gggttgagat gtgtataaga gacaggcagc atactgtgcg cgacaccgcg 1500 

ttcctgatgg aacagttgga gctgcgcgaa gagctggacg agatcgaaca ggcgaaagat 1560 

gaagcgcggc tggaaagctt tatcaaacgt gtgaaaaaga tgtttgatac ccgccatcag 1620 

ttgatggttg aacagttaga caacgagacg tgggacgcgg cggcggatac cgtgcgtaag 1680 

ctgcgttttc tcgataaact gcgaagcagt gccgaacaac tcgaagaaaa actgctcgat 1740 

ttttaa 1746 

<210> 43 ~ .- 

<211> 8609 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Reporter plasmid pPCBlS 
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<400> 43 
cgtatggcaa 


tgaaagacgg 


tgagctggtg 


atatgggata 


gtgttcaccc 


ttgttacacc 


60 


gttttccatg 


agcaaactga 


aacgttttca 


tcgctctgga 


gtgaatacca 


cgacgatttc 


120 


cggcagtttc 


tacacatata 


ttcgcaagat 


gtggcgtgtt 


acggtgaaaa 


cctggcctat 


180 


ttccctaaag 


ggtttattga 


gaatatgttt 


ttcgtctcag 


ccaatccctg 


ggtgagtttc 


240 


accagttttg 


atttaaacgt 


ggccaatatg 


gacaacttct 


tcgcccccgt 


tttcaccatg 


300 


ggcaaatatt 


atacgcaagg 


cgacaaggtg 


ctgatgccgc 


tggcgattca 


ggttcatcat 


360 


gccgtctgtg 


atggcttcca 


tgtcggcaga 


atgcttaatg 


aattacaaca 


gtactgcgat 


420 


gagtggcagg 


gcggggcgta 


atttttttaa 


ggcagttatt 


ggtgcctaga 


aatattttat 


480 


ctgattaata 


agatgatctt 


cttgagatcg 


ttttggtctg 


cgcgtaatct 


cttgctctga 


540 


aaacgaaaaa 


accgccttgc 


agggcggttt 


ttcgaaggtt 


ctctgagcta 


ccaactcttt 


600 


gaaccgaggt 


aactggcttg 


gaggagcgca 


gtcaccaaaa 


cttgtccttt 


cagtttagcc 


660 


ttaaccggcg 


catgacttca 


agactaactc 


ctctaaatca 


attaccagtg 


gctgctgcca 


720 


gtggtgcttt 


tgcatgtctt 


tccgggttgg 


actcaagacg 


atagttaccg 


gataaggcgc 


780 


agcggtcgga 


ctgaacgggg 


ggttcgtgca 


tacagtccag 


cttggagcga 


actgcctacc 


840 


cggaactgag 


tgtcaggcgt 


ggaatgagac 


aaacgcggcc 


ataacagcgg 


aatgacaccg 


900 


gtaaaccgaa 


aggcaggaac 


aggagagcgc 


acgagggagc 


cgccagggga 


aacgcctggt 


960 


atctttatag 


tcctgtcggg 


tttcgccacc 


actgatttga 


gcgtcagatt 


tcgtgatgct 


1020 


tgtcaggggg 


gcggagccta 


tggaaaaacg 


gctttgccgc 


ggccctctca 


cttccctgtt 


1080 


aagtatcttc 


ctggcatctt 


ccaggaaatc 


tccgccccgt 


tcgtaagcca 


tttccgctcg 


1140 


ccgcagtcga 


acgaccgagc 


gtagcgagtc 


agtgagcgag 


gaagcggaat 


atatcctgta 


1200 v 


tcacatattc 


tgctgacgca 


ccggtgcagc 


cttttttctc 


ctgccacatg 


aagcacttca 


1260 .. 


ctgacaccct 


catcagtgcc 


aacatagtaa 


gccagtatat 


acactccgct 


agcgcccaat 


1320 


acgcaaaccg 


cctctccccg 


cgcgttggcc 


gattcattaa 


tgcagctggc 


acgacaggtt 


1380 


tcccgactgg 


aaagcgggca 


gtgagcgcaa 


cgcaattaat 


gtgagttagc 


tcactcatta 


1440 


ggcaccccag 


gctttacact 


ttatgcttcc 


ggctcgtatg 


ttgtgtggaa 


ttgtgagcgg 


1500 


ataacaattt 


cacacaggaa 


acagctatga 


ccatgattac 


gaattcgagc 


tcggtaccca 


1560 


aacgaattcg 


cccttttgac 


ggtctgcgca 


aaaaaacacg 


ttcaccttac 


tggcatttcg 


1620 


gctgagcagt 


tgctggctga 


tatcgatagc 


cgccttgatc 


agttactgcc 


ggttcagggt 


1680 


gagcgggatt 


gtgtgggtgc 


cgcgatgcgt 


gaaggcacgc 


tggcaccggg 


caaacgtatt 


1740 


cgtccgatgc 


tgctgttatt 


aacagcgcgc 


gatcttggct 


gtgcgatcag 


tcacggggga 


1800 


ttactggatt 


tagcctgcgc 


ggttgaaatg 


gtgcatgctg 


cctcgctgat 


tctggatgat 


1860 


atgccctgca 


tggacgatgc 


gcagatgcgt 


cgggggcgtc 


ccaccattca 


cacgcagtac 


1920 


ggtgaacatg 


tggcgattct 


ggcggcggtc 


gctttactca 


gcaaagcgtt 


tggggtgatt 


1980 


gccgaggctg 


aaggtctgac 


gccgatagcc 


aaaactcgcg cggtgtcgga 
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gcgattggca 


tgcagggtct 


ggttcagggc cagtttaagg 


acctctcgga 


aggcgataaa 


2100 


ccccgcagcg 


ccgatgccat 


actgctaacc aatcagttta 


aaaccagcac 


gctgttttgc 


2160 


gcgtcaacgc 


aaatggcgtc 


cattgcggcc aacgcgtcct 


gcgaagcgcg 


tgagaacctg 


2220 


catcgtttct 


cgctcgatct 


cggccaggcc tttcagttgc 


ttgacgatct 


taccgatggc 


2280 


atgaccgata 


ccggcaaaga 


catcaatcag gatgcaggta 


aatcaacgct 


ggtcaattta 


2340 


ttaggctcag 


gcgcggtcga 


agaacgcctg cgacagcatt 


tgcgcctggc 


cagtgaacac 


2400 


ctttccgcgg 


catgccaaaa 


cggccattcc accacccaac 


tttttattca 


ggcctggttt 


2460 


gacaaaaaac 


tcgctgccgt 


cagttaagga tgctgcatga 


gccattttgc 


ggtgatcgca 


2520 


ccgccctttt 


tcagccatgt 


tcgcgctctg caaaaccttg 


ctcaggaatt 


agtggcccgc 


2580 


ggtcatcgtg 


ttacgttttt 


tcagcaacat gactgcaaag 


cgctggtaac 


gggcagcgat 


2640 


atcggattcc 


agaccgtcgg 


actgcaaacg catcctcccg 


gttccttatc 


gcacctgctg 


2700 


cacctggccg 


cgcacccact 


cggaccctcg atgttacgac 


tgatcaatga 


aatggcacgt 


2760 


accagcgata 


tgctttgccg 


ggaactgccc gccgcttttc 


atgcgttgca 


gatagagggc 


2820 


gtgatcgttg 


atcaaatgga 


gccggcaggt gcagtagtcg 


cagaagcgtc 


aggtctgccg 


2880 


tttgtttcgg 


tggcctgcgc 


gctgccgctc aaccgcgaac 


cgggtttgcc 


tctggcggtg 


2940 


atgcctttcg 


agtacggcac 


cagcgatgcg gctcgggaac 


gctataccac 


cagcgaaaaa 


3000 


atttatgact 


ggctgatgcg 


acgtcacgat cgtgtgatcg 


cgcatcatgc 


atgcagaatg 


3060 


ggtttagccc 


cgcgtgaaaa 


actgcatcat tgtttttctc 


cactggcaca 


aatcagccag 


3120 


ttgatccccg 


aaetggattt 


tccccgcaaa gcgctgccag 


actgctttca 


tgcggttgga 


3180 


ccgttacggc 


aaccccaggg 


gacgccgggg tcatcaactt 


cttattttcc 


gtccccggac 


3240 


aaaccccgta 


tttttgcctc 


gctgggcacc ctgcagggac 


atcgttatgg 


cctgttcagg 


3300 


accatcgcca 


aagcctgcga 


agaggtggat gcgcagttac 


tgttggcaca 


ctgtggcggc 


3360 


ctctcagcca 


cgcaggcagg 


tgaactggcc cggggcgggg 


acattcaggt 


tgtggatttt 


3420 


gccgatcaat 


ccgcagcact 


ttcacaggca cagttgacaa 


tcacacatgg 


tgggatgaat 


3480 


acggtactgg 


acgctattgc 


ttcccgcaca ccgctactgg 


cgctgccgct 


ggcatttgat 


3540 


caacctggcg 


tggcatcacg 


aattgtttat catggcatcg 


gcaagcgtgc 


gtctcggttt 


3600 


actaccagcc 


atgcgctggc 


gcggcagatt cgatcgctgc 


tgactaacac 


cgattacccg 


3660 


cagcgtatga 


caaaaattca 


ggccgcattg cgtctggcag 


gcggcacacc 


agccgccgcc 


3720 


gatattgttg 


aacaggcgat 


gcggacctgt cagccagtac 


tcagtgggca 


ggattatgca 


3780 


accgcactat 


gatctcattc 


tggtcggtgc cggtctggct 


aatggcctta 


tcgcgctccg 


3840 


gcttcagcaa 


cagcatccgg 


atatgcggat cttgcttatt 


gaggcgggtc 


ctgaggcggg 


3900 


agggaaccat 


acctggtcct 


ttcacgaaga ggatttaacg 


ctgaatcagc 


atcgctggat 


3960 


agcgccgctt 


gtggtccatc 


actggcccga ctaccaggtt 


cgtttccccc 


aacgccgtcg 


4020 


ccatgtgaac 


agtggctact 


actgcgtgac ctcccggcat 


ttcgccggga 


tactccggca 


4080 
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acagtttgga 


caacatttat 


ggctgcatac 


cgcggtttca 


gccgttcatg 


ctgaatcggt 


4140 


ccagttagcg 


gatggccgga 


ttattcatgc 


cagtacagtg 


atcgacggac 


ggggttacac 


4200 


gcctgattct 


gcactacgcg 


taggattcca 


ggcatttatc 


ggtcaggagt 


ggcaactgag 


4260 


cgcgccgcat 


ggtttatcgt 


caccgattat 


catggatgcg 


acggtcgatc 


agcaaaatgg 


4320 


ctaccgcttt 


gtttataccc 


tgccgctttc 


cgcaaccgca 


ctgctgatcg 


aagacacaca 


4380 


ctacattgac 


aaggctaatc 


ttcaggccga 


acgggcgcgt 


cagaacattc 


gcgattatgc 


4440 


tgcgcgacag 


ggttggccgt 


tacagacgtt 


gctgcgggaa 


gaacagggtg 


cattgcccat 


4500 


tacgttaacg 


ggcgataatc 


gtcagttttg 


gcaacagcaa 


ccgcaagcct 


gtagcggatt 


4560 


acgcgccggg 


ctgtttcatc 


cgacaaccgg 


ctactcccta 


ccgctcgcgg 


tggcgctggc 


4620 


cgatcgtctc 


agcgcgctgg 


atgtgtttac 


ctcttcctct 


gttcaccaga 


cgattgctca 


4680 


ctttgcccag 


caacgttggc 


agcaacaggg 


gtttttccgc 


atgctgaatc 


gcatgttgtt 


4740 


tttagccgga 


ccggccgagt 


cacgctggcg 


tgtgatgcag 


cgtttctatg 


gcttacccga 


4800 


ggatttgatt 


gcccgctttt 


atgcgggaaa 


actcaccgtg 


accgatcggc 


tacgcattct 


4860 


gagcggcaag 


ccgcccgttc 


ccgttttcgc 


ggcattgcag 


gcaattatga 


cgactcatcg 


4920 


ttgaagagcg 


actacatgaa 


accaactacg 


gtaattggtg 


cgggctttgg 


tggcctggca 


4980 


ctggcaattc 


gtttacaggc 


cgcaggtatt 


cctgttttgc 


tgcttgagca 


gcgcgacaag 


5040 


ccgggtggcc 


gggcttatgt 


ttatcaggag 


cagggcttta 


cttttgatgc 


aggccctacc 


5100 


gttatcaccg 


atcccagcgc 


gattgaagaa 


ctgtttgctc 


tggccggtaa 


acagcttaag 


5160 


gattacgtcg 


agctgttgcc 


ggtcacgccg 


ttttatcgcc 


tgtgctggga 


gtccggcaag 


5220 


gtcttcaatt 


acgataacga 


ccaggcccag 


ttagaagcgc 


agatacagca 


gtttaatccg 


5280 


cgcgatgttg 


cgggttatcg 


agcgttcctt 


gactattcgc 


gtgccgtatt 


caatgagggc 


5340 


tatctgaagc 


tcggcactgt 


gcctttttta 


tcgttcaaag 


acatgcttcg 


ggccgcgccc 


5400 


cagttggcaa 


agctgcaggc 


atggcgcagc 


gtttacagta 


aagttgccgg 


ctacattgag 


5460 


gatgagcatc 


ttcggcaggc 


gttttctttt 


cactcgctct 


tagtgggggg 


gaatccgttt 


5520 


gcaacctcgt 


ccatttatac 


gctgattcac 


gcgttagaac 


gggaatgggg 


cgtctggttt 


5580 


ccacgcggtg 


gaaccggtgc 


gctggtcaat 


ggcatgatca 


agctgtttca 


ggatctgggc 


5640 


ggcgaagtcg 


tgcttaacgc 


ccgggtcagt 


catatggaaa 


ccgttgggga 


caagattcag 


5700 


gccgtgcagt 


tggaagacgg 


cagacggttt 


gaaacctgcg 


cggtggcgtc 


gaacgctgat 


5760 


gttgtacata 


cctatcgcga 


tctgctgtct 


cagcatcccg 


cagccgctaa 


gcaggcgaaa 


5820 


aaactgcaat 


ccaagcgtat 


gagtaactca 


ctgtttgtac 


tctattttgg 


tctcaaccat 


5880 


catcacgatc 


aactcgccca tcataccgtc 


tgttttgggc 


cacgctaccg 


tgaactgatt 


5940 


cacgaaattt 


ttaaccatga tggtctggct 


gaggattttt 


cgctttattt 


acacgcacct 


6000 


tgtgtcacgg 


atccgtcact 


ggcaccggaa 


gggtgcggca 


gctattatgt 


gctggcgcct 


6060 


gttccacact 


taggcacggc gaacctcgac 


tgggcggtag 


aaggaccccg 


actgcgcgat 


6120 
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cgtatttttg actaccttga gcaacattac atgcctggct tgcgaagcca gttggtgacg 6180 

caccgtatgt ttacgccgtt cgatttccgc gacgagctca atgcctggca aggttcggcc 6240 

ttctcggttg aacctattct gacccagagc gcctggttcc gaccacataa ccgcgataag 6300 

cacattgata atctttatct ggttggcgca ggcacccatc ctggcgcggg cattcccggc 6360 

gtaatcggct cggcgaaggc gacggcaggc ttaatgctgg aggacctgat ttgacgaata 6420 

cgtcattact gaatcatgcc gtcgaaacca tggcggttgg ctcgaaaagc tttgcgactg 6480 

catcgacgct tttcgacgcc aaaacccgtc gcagcgtgct gatgctttac gcatggtgcc 6540 

gccactgcga cgacgtcatt gacgatcaaa cactgggctt tcatgccgac cagccctctt 6600 

cgcagatgcc tgagcagcgc ctgcagcagc ttgaaatgaa aacgcgtcag gcctacgccg 6660 

gttcgcaaat gcacgagccc gcttttgccg cgtttcagga ggtcgcgatg gcgcatgata 6720 

tcgctcccgc ctacgcgttc gaccatctgg aaggttttgc catggatgtg cgcgaaacgc 6780 

gctacctgac actggacgat acgctgcgtt attgctatca cgtcgccggt gttgtgggcc 6840 

tgatgatggc gcaaattatg ggcgttcgcg ataacgccac gctcgatcgc gcctgcgatc 6900 

tcgggctggc tttccagttg accaacattg cgcgtgatat tgtcgacgat gctcaggtgg 6960 

gccgctgtta tctgcctgaa agctggctgg aagaggaagg actgacgaaa gcgaattatg 7020 

ctgcgccaga aaaccggcag gccttaagcc gtatcgccgg gcgactggta cgggaagcgg 7080 

aaccctatta cgtatcatca atggccggtc tggcacaatt acccttacgc tcggcctggg 7140 

ccatcgcgac agcgaagcag gtgtaccgta aaattggcgt gaaagttgaa caggccggta 7200 

agcaggcctg ggatcatcgc cagtccacgt ccaccgccga aaaattaacg cttttgctga 7260 

cggcatccgg tcaggcagtt acttcccgga tgaagacgta tccaccccgt cctgctcatc 7320V 

tctggcagcg cccgatctag ccgcatgcct ttctctcagc gtcgcctgaa gtttagataa 7380 \ 

cggtggcgcg tacagaaaac caaaggacac gcagccctct tttcccctta cagcatgatg 7440 

catacggtgg gccatgtata accgtttcag gtagcctttg cgcggtatgt agcggaacgg 7500 

ccagcgctgg tgtaccagtc cgtcgtggac cataaaatac agtaaaccat aagcggtcat 7560 

gcctgcacca atccactgga gcggccagat tcctgtactg ccgaagtaaa tcagggcaat 7620 

cgacacaatg gcgaatacca cggcatagag atcgttaact tcaaatgcgc ctttacgcgg 7680 

ttcatgatgt gaaagatgcc agccccaacc ccagccgtgc atgatgtatt tatgtgccag 7740 

tgcagcaacc acttccatgc cgaccacggt gacaaacacg atcagggcat tccaaatcca 7800 

caacataatt tctcaagggc gaattcgcgg ggatcctcta gagtcgacct gcaggcatgc 7860 

aagcttggca ctggccgtcg ttttacaacg tcgtgactgg gaaaaccctg gcgttaccca 7920 

acttaatcgc cttgcagcac atcccccttt cgccagctgg cgtaatagcg aagaggcccg ' 7980 

caccgatcgc ccttcccaac agttgcgcag cctgaatggc gaatggcgct gatgtccggc 8040 

ggtgcttttg ccgttacgca ccaccccgtc agtagctgaa caggagggac agctgataga 8100 

aacagaagcc actggagcac ctcaaaaaca ccatcataca ctaaatcagt aagttggcag 8160 
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catcacccga cgcactttgc gccgaataaa tacctgtgac ggaagatcac ttcgcagaat 8220 

aaataaatcc tggtgtccct gttgataccg ggaagccctg ggccaacttt tggcgaaaat 8280 

gagacgttga tcggcacgta agaggttcca actttcacca taatgaaata agatcactac 8340 

cgggcgtatt ttttgagtta tcgagatttt caggagctaa ggaagctaaa atggagaaaa 8400 

aaatcactgg atataccacc gttgatatat cccaatggca tcgtaaagaa cattttgagg 8460 

catttcagtc agttgctcaa tgtacctata accagaccgt tcagctggat attacggcct 8520 

ttttaaagac cgtaaagaaa aataagcaca agttttatcc ggcctttatt cacattcttg 8580 

cccgcctgat gaatgctcat ccggaattt 8609 
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